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SUMMARY 


The  purpose  of  this  dissertation  is  to  develop  mathematical 
models  and  solution  techniques  to  find  optimal  tactics  for  antisubmarine 
warfare  (ASW)  operations.  Specifically,^  two  types  of  ASW  operations 
are  considered:  (J)  a  hunter-killer  force  (PI)  is  searching  for  a 
submarine  (P2),  and  &  PI  is  attacking^P2.  Both  of  these  types  of 
operations  are  formulated  as  two-person  zero-sum  games.  These 
game  formulations  distinguish  this  work  from  the  literature  since  they 
allow  P2  as  well  as  PI  to  choose  tactics. 

Both  sequential  and  non -sequential  search  games  are  developed./ 
For  one  of  the  non -sequential  games,  the  search  region  is  divided  into 
n  cells.  In  each  play  of  the  game,  PI  chooses  a  cell  to  search  and  P2 
chooses  a  cell  in  which  to  hide.  The  resulting  paypff  is  the  probability 
that  PI  detects  P2.  We  assume  that  PI  att.enhpts  to  maximize  this 
probability  of  detection  while  P2  atteirxpts  to  minimize  it.  Therefore, 
the  game  is  zero-sum;  and^fprthermore,  P2  is  thereby  given  the  role 
of  an  evader.  We  also/introduce  another  similar  search  game,  and  we 
show  how-to  include  (secondary  objectives  and  additional  information  by 
extending  these  games  to  constrained  game  formulations.  _ _ _ _ 
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Sequential  games^whe*e-et  play  cewoiots  of  several  muves^are 
also  developed.  When  the  players  move,  they  not  only  determine  a 
payoff  but  also  the  probability  that  the  play  terminate s.-.  For  the  case 
of  at  most  a  finite  number  of  moves,  optimal  strategies  are  found  by 

solving  a  recursive  sequence  of  two-person  zero-sum  g4mes.  For  the 

\ 

infinite-move  game,  we  develop  an  iterative  method  to  approximate  the 
solution  to  within  desired  accuracy.  Finally,  we  show  that  the  strate¬ 
gies  which  minimax  the  expected  duration  of  the  game  must  also 
maximin  the  one-step  termination  probability .j 


v  To  study  attack  operations) 


l'  stochastic  game  due  to  Shapley/^  In  this  formulation,  a  pure  strategy 

/ 

is  a  tactical  plan  of  action  for  each  possible  state  of  the  operation.  The 
objective  is  taken  to  be  either  minimax  the  time  or  the  probability  for 
PI  to  kill  P2.  We  derive  two  methods  to  find  the  solution  to  this 
stochastic  game;  one  method  iterates  on  the  strategies,  and  the  other 
iterates  on  the  payoffs.  One  special  case  which  is  studied  is  a 


Markovian  decision  process,  and  one  extension  is  a  constrained 


stochastic  game.  *  T 

C'* 

i  ^ 

'*•  Finally,  we-4nve  atigate.  multiple  contact  problems^  Our  models 
rely  on  the  assumption  that  the  amount  of  effort  which  is  required  to 
accomplish  a  specified  mission  is  a  random  variable  with  a  known 
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distribution  function.  Several  objective  functions  are  employed,  and 
one  of  the  models  is  a  chance -constrained  distribution  model.  By  using 
a  zero -order  decision  rule,  we  show  that  the  deterministic  equivalent 
of  this  model  is  a  distribution  model  with  integer  extreme  points;  and 
hence,  optimal  integer  assignments  can  be  obtained  with  ordinary  non¬ 
integer  methods. 
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CHAPTER  I  -  INTRODUCTION 


1.  1  Problem  Setting 

In  this  chapter  we  delineate  the  problems  studied,  survey  the 
literature,  summarize  our  results,  and  introduce  the  elements  of 
game  theory  and  mathematical  programming  which  are  used  throughout 
this  work.  This  section  defines  the  type  of  antisubmarine  warfare 
problems  which  will  be  considered.  First,  we  distinguish  between  * 

strategy  and  tactics.  Then  we  examine  the  tactical  environment  of  a 
hunter -killer  force  and  the  interaction  between  it  and  a  submarine. 

We  close  this  section  by  defining  the  two  types  of  problems  which  will 
be  studied. 

Tactics  and  strategy  differ  in  the  level  of  decision  making,  with 
the  lowest  level  of  strategic  decisions  merging  into  the  highest  level  of 
tactical  decisions.  Furthermore,  strategic  plans  are  implemented  by 
tactical  operations.  In  this  way,  strategy  fixes  the  environment  where 
tactical  operations  will  take  place.  For  example,  geographic  position, 
force  size,  military  hardware,  and  the  military  mission  are  usually 
fixed  tactical  factors.  Tactics  then  determine  the  way  in  which  the 
available  forces  will  be  used  to  achieve  given  military  objectives.  See 
Eccles  [1]  for  further  elaboration  on  these  points. 
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We  study  tactical  situations  which  are  typically  encountered  by  a 
hunter -killer  force  in  antisubmarine  warfare  (ASW).  First,  the  primary 
mission  of  a  hunter-killer  force  is  to  seek  and  destroy  submarines.  * 
Second,  certain  environmental  factors  in  ASW  operations  are  fixed. 
Typical  fixed  factors  for  the  hunter -killer  force  (PI)  are  detection 
capability,  speed,  endurance,  operating  region,  and  vulnerability.  On 
the  other  hand,  the  submarines  (P2)  also  have  a  mission;  and  in  many 
cases,  it  is  desirable  for  P2  to  avoid  detection  in  order  to  accomplish 
its  mission.  Hence,  in  many  hunter -killer  operations  PI  and  P2  are 
in  direct  conflict,  with  PI  attempting  to  detect  P2  and  with  P2  at¬ 
tempting  to  avoid  detection.  Our  models  will  deal  with  these  situations 
of  direct  conflict,  although  in  some  cases  we  will  also  allow  secondary 
objectives. 

We  separate  the  tactical  problems  confronting  PI  into  search 
problems  and  contact  problems.  The  essential  difference  between  these 
problems  is  the  amount  of  information  which  is  available  to  PI .  In 
search  problems  PI  has  not  established  a  contact  with  P2,  and  in  contact 
situations  PI  has  a  contact.  For  search  problems  we  will  develop 
model t  tc  determine  an  optimal  distribution  of  search  effort,  and  for 
contact  problems  we  develop  model*  to  find  an  optimal  tactical 

*Sternhel!  and  Thorndike,  [1], 


configuration  of  the  forces  for  each  possible  state  of  information. 

Let  us  restrict  our  attention  for  a  moment  to  search  situations. 
Typically.  PI  and  P2  each  have  two  modes  of  search,  active  and 
passive.  In  the  active  mode,  detection  devices  are  emitting  electro¬ 
magnetic  radiation  and  receiving  echoes  back.  The  passive  mode  is 
simply  a  listening  mode;  no  radiation  is  emitted.  If  PI  operates  in  the 
active  mode  and  P2  operates  in  the  passive  mode,  then  P2  can  detect 
Pi's  signal  when  the  range  is  too  great  for  the  echo  to  return  to  PI. 

Thus,  a  passive  submarine  can  detect  an  active  searcher  without 

( 

divulging  its  own  location.  It  follows  that  a  submarine  may  choose 
evasive  vactics  before  a  contact  is  established  by  an  active. searcher. 
Most  of  the  proposed  search  models  in  the  literature  do  not  allow  for  an 
active  submarine,  but  we  make  this  allowance  by  utilizing  game  theo¬ 
retic  formulations. 

Summing  up,  we  model  tactical  hunter -killer  operations  and  we 
separate  these  operations  into  search  and  contact  situations.  For 
search  situations,  the  search  region  and  detection  capabilities  are  fixed. 
Under  the  assumption  that  the  primary  objective  of  PI  is  to  detect  P2 
and  that  P2  attempts  to  avoid  detection,  we  determine  an  optimal  distri¬ 
bution  of  search  effort.  In  contact  situations,  we  wish  to  find  an  optimal 
tactical  plan  for  each  state  of  information.  Before  outlining  our  models 
of  these  situations  and  subsequent  results,  we  survey  the  unclassified 


literature. 


/J 
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1. 2  Literature 

No  papers  on  contact  problems  have  been  found.  However,  a 
large  literature  on  search  problems  is  available.  Three  important  and 
early  papers  on  search,  which  encouraged  further  work,  are:  Koopman 
[1]  (1946),  [2],  [3],  and  [4];  Bellman  [1]  (1957);  and  von  Neumann  [1] 
(1953).  Koopman  formulated  the  first  published  non -sequential  problem 
on  the  optimal  distribution  of  search  effort;  Bellman  formulated  the 
first  published  sequential  search  model;  and  von  Neumann  formulated 
two  non -sequential  mimimax'se&rch  problems. 

The  classical  work  of  Koopman  [1]  and  [4]  can  be  stated  as 
follows:  Find  a  function  0  which  maximizes 

F{0)  =  [  p(x)  g  (0(x))  dx 
JR 

(1)  Subject  to:  f  0  (x)  dx  =  A,  0(x)  2:0  xdR 

JR 

where  0  is  the  search  density  function  and  F  (0)  is  the  probability  of 
detection,  expressed  as  an  integral  of  the  known  submarine  pro1-  bility 
density  p  (j$  *.nd  the  conditional  probability  of  detection  g  (0  (x) ) . 

The  constraints  require  that  the  total  amount  of  search  effort  be  equal 
to  A  and  0  (x)  is  non-negative  over  the  search  region  R.  Koopman 
developed  a  graphical  method  to  solve  (1)  with  the  exponential  detection 
function  g  (0(x))  =  1  -  e~^x^  .  In  1958,  Charnes  and  Cooper  [7]  de- 
veloped  a  method  to  obtain  an  analytical  solution  to  a  discrete  form  of 


(1),  again  with  the  exponential  detection  function.  Later,  de  Guenin  [1] 
(1961)  obtained  an  analytic  solution  to  (1)  with  essentially  the  require¬ 
ment  that  g'  t)  has  a  decreasing  derivative  with  increasing  t  (the  de¬ 
tection  function  exhibits  a  saturation  effect  as  effort  increases). 

Finally,  £ahi  fl]  (1963)  solved  (1)  with  only  a  continuity  restriction  on 
g.  Some  results  on  a  sequential  version  of  (1)  were  obtained  by  Dobbie 
[1].  He  derived  conditions  for  the  optimal  distribution  of  effort 
Ej  +  E2  to  be  the  sum  of  the  optimal  distribution  of  E^  and  the  con¬ 
ditionally  optimal  distribution  of  E2  given  the  submarine  has  not  been 
found  with  the  effort  E  j  . 

The  following  non-sequentiai  search  problem  was  formulated  and 
solved  by  von  Neumann  [1]  (1953).  The  search  region  is  divided  into 
n-cells.  If  PI  (searcher)  and  P2  (hider)  both  choose  the  same  cell, 
then  PI  detects  P2  with  positive  probability;  otherwise,  the  detection 
probability  is  zero.  PI  attempts  to  maximize  the  probability  of  de¬ 
tection,  while  P2  minimizes  it.  Hence,  the  theory  of  two-person  zero- 
sum  games  applies,  von  Nuemann  went  on  to  formulate  another  zero- 

sum  search  game  which  is  equivalent  to  the  assignment  problem.  We 

'•s 

discuss  both  of  these  problems  in  furthei’  detail  in  Chapter  2.  Neuts 
[1]  (1963)  extended  von  Neumann's  n-cell  search  game  to  an  infinite 
number  of  moves,  but  only  a  certain  type  of  detection  function  is  per¬ 
mitted.  We  also  discuss  this  extension  in  Chapter  2.  This  completes 
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our  discussion  of  basic  non -sequential  search  models  and  extensions  to 
the  sequential  case. 

In  1957,  Bellman  [1]  formulated  an  n-cell  sequential  search 
problem.  Hs  assumed  that  the  searcher  has  a  prior  probability  distri¬ 
bution  on  the  location  of  the  submarine  and  that  the  cost  of  searching  a 
cell  is  also  known.  The  searcher  then  looks  in  one  cell  at  a  time  until 
the  submarine  is  found.  Bellman  found  the  policy  (sequence  of  cells  to 
search)  which  minimizes  the  total  expected  cost.  Gluss  [1]  (1961)  added 
a  search  cost  to  Bellman's  problem  which  depends  on  the  distance  be¬ 
tween  successive  looks  (moving  cost).  He  found  an  optimal  policy  for 
several  important  cases  of  the  prior  probabilities. 

Various  sequential  search  models  have  been  formulated  where 
the  prior  probabilities  are  transformed  to  Bayesian  posterior  proba¬ 
bilities.  Neuts  [1]  extended  Bellman's  model  to  include  Bayesian's 
updating  of  prior  probabilities.  Norris  [1]  employs  a  minimax  opti¬ 
mization  for  the  Neuts  model,  but  his  results  were  only  complete  tor 
the  case  of  two  cells.  Finally,  Pollock  [1]  formulated  a  Bayesian,  test 
of  hypothesis  model.  At  the  beginning  of  this  sequential  search,  the 
searcher  estimates  the  probability  p  that  a  submarine  is  present  in  the 
search  region.  Each  time  a  search  is  made,  p  changes  according  to  tbs 
Bayesian  rule.  After  each  search  is  made,  the  yearch^r  takes  one  of 
the  following  three  decisions:  (1)  make  another  search,  (Z)  accept  the 


hypothesis  H0  that  a  submarine  is  present,  or  (3)  reject  H0.  The 
optimal  decision  depends  on  the  current  estimate  of  p  and  the  costs  of 
wrong  decisions. 

The  last  type  of  search  models  which  we  discuss  will  only  be 
mentioned  in  passing.  These  are  n-cell  search  models  where  a  subma¬ 
rine  arrives  in  some  cell  of  the  search  region  at  a  random  time  during 
the  search.  Such  models  have  been  studied  by  Blackman  [1],  Blackman 
and  Pros  chan  [1],  and  Pollock  [1]. 
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1.  3  Results 

Bringing  together  the  discussion  of  the  last  two  sections,  it 
becomes  apparent  that  models  are  required  for  hunter -killer  operations 
which  allow  the  submarine  as  well  as  the  searcher  to  make  tactical  de¬ 
cisions.  Most  authors  assume  that  the  submarine  maintains  a  known 
stationary  probability  distribution.  We  do  not  make  this  assumption 
and  furthermore  we  allow  the  probability  of  detection  to  be  a  function  of 
range.  All  of  the  distribution  of  effort  models  reviewed  in  the  last 
section  have  assumed  a  negligible  radius  of  detection,  and  all  of  the 
models,  except  the  minimax  models,  assume  a  stationary  submarine. 

With  these  observations  in  mind,  we  preview  the  models  and  results 
obtained  in  the  next  four  chapters. 

We  introduce  our  results  by  chapter.  The  first  model  of  Chapter  2 
is  an  n-cell  search  game.  This  game  is  non-sequential  but  extensions 
of  it  to  a  sequential  game  are  made  in  Chapter  i.  We  assume  that  PI 
attempts  to  maximize  the  probability  of  detection  while  P2  attempts  to 
minimize  it.  In  reality  there  doesn't  have  to  be  a  submarine  present  in 
the  searching  region  for  this  model  to  apply  and  P2  may  consist  of  one 
or  more  submarines.  But,  we  are  assuming  that  PI  should  act  as  if  an 
evading  submarine  was  present.  Hence,  we  seek  a  distribution  of  effort 
for  PI  which  will  maximize  the  minimum  probability  of  detection 
against  all  possible  hiding  strategies  that  P2  can  choose.  Our  model 
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is  also  formulated  to  allow  the  probability  of  detection  to  be  a  function 
of  range. 

Now  the  above  model  only  applies  when  neither  PI  nor  P2  have 
information  on  where  the  other  player  is  searching  or  hiding.  However, 
Pi  may  have  intelligence  information  which  can  be  used  to  bound  the 
probability  that  P2  is  located  in  certain  cells  or  these  bounds  may  arise 
from  previous  searches.  If  such  information  is  available,  then  the 
optimal  distribution  of  effort  obtained  from  the  foregoing  model  will  be 
too  "conservative".  To  take  into  account  certain  information  on  hiding 
locations  we  show  how  to  extend  the  game  to  a  constrained  game.  1 
This  extension  will  make  PI!s  strategy  less  conservative  but  perhaps 
more  risky.  We  al#o  show  how  to  include  other  types  of  information 
which  may  arise  in  searching  situations.  Finally,  we  give  an  example 
of  this  model  and  obtain  an  analytic  solution  for  the  special  case  of  a 
negligible  radius  of  detection. 

We  also  propose  a  second  model  in  Chapter  2.  This  model  is 
especially  suited  to  search  in  sweeps  of  the  search  region.  For  example, 
search  by  aircraft.  Again  the  opposing  objectives  of  detection  and 
evasion  are  assumed,  and  the  searcher  seeks  to  minimax  the  proba¬ 
bility  of  detection.  The  game  formulation  of  this  problem  is  reduced  to 

*The  notion  of  a  constrained  game  as  incorporating  "habits"  or 
other  qualitative  probabilistic  information,  was  introduced  bv  Charnes  rn 
1953.  L  J * 
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a  dyadic  model,  a  generalization  of  the  distribution  problem.  Special 
cases  of  this  game  result  in  a  transportation  and  an  assignment 
problem.  The  chapter  is  completed  with  tactical  examples  of  this 
model.  Both  of  these  games  are  generalization  of  von  Neumann's  [1] 
search  games. 

In  Chapter  3,  we  develop  a  sequential  extension  of  the  n-cell 
search  model  of  Chapter  2.  When  the  players  each  choose  a  cell,  they 
not  only  determine  a  probability  of  detection  but  they  also  determine  a 
probability  that  the  game  is  played  again.  We  consider  both  a  finite 
and  an  infinite  number  of  moves.  In  the  finite  case,  optimal  strategies 
can  be  found  by  solving  a  recursive  sequence  of  two-person  zero-sum 
games.  Substantially  less  computational  effort  is  required  by  this  pro¬ 
cedure  chan  solution  of  the  normalized  form  of  this  game.  For  the 

infinite  game,  the  problem  is  reduced  to  finding  strategies  X  and  Y 
.  .  y^ay 

which  minimax  the  form  -  ,  where  A  is  the  payoff  matrix  for  each 

XtQY 

move  and  Q  is  the  matrix  of  non- zero  stop  probabilities.  We  show  how 
to  find  the  optimal  strategies  by  solving  a  linear  programming  problem 
with  a  parameter  in  the  constraint  set.  We  demonstrate  that  optimal 
strategies  are  obtained  when  this  parameter  is  chosen  to  make  the 
optimal  value  of  the  objective  function  equal  to  ze.ro.  Then  we  develop 
a  technique  to  find  such  a  value  of  the  parameter  in  a  finite  number  of 
steps.  Chapter  3  is  concluded  with  an  example  which  compares  the 


H 


i  (  ' 
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non -sequential  game  to  both  the  finite  and  infinite  sequential  search 
games. 

In  Chapter  4  we  study  the  contact  problem.  However,  the  models 
which  are  developed  may  also  be  applied  to  some  types  of  search  situ¬ 
ations,  Briefly,  we  view  the  contact  problem  as  a  certain  game  of 
pursuit  between  PI  and  P2.  This  game  consists  of  a  finite  collection 
of  states  and  each  state  corresponds  to  a  possible  tactical  configuration 
of  the  hunter -killer  forces.  At  each  move,  the  players  observe  the 
state  of  the  game  and  each  player  chooses  a  tactical  plan  from  a  finite 
collection.  The  observed  state  and  the  chosen  tactical  plans  jointly 
determine  an  immediate  payoff  and  a  transition  probability  distri¬ 
bution  over  the  states.  Before  the  next  move  is  made,  the  game 
transits  to  one  of  the  states  or  terminates  according  to  the  chosen 
probability  distribution.  We  seek  to  find  an  optimal  strategy  for  each 
player.  An  optimal  strategy  is  one  of  a  minimax  pair  for  the  total 
expected  payoff. 

This  game  is  a  stochastic  game  due  to  Shapley  [lj.  He  defined 
a  vector  value  and  employed  an  ingenious  argument  to  establish  its 
existence  and  that  of  optimal  strategies.  In  addition,  he  showed  that 
the  value  and  optimal  strategies  are  characterized  by  3-  non -linear  fixed 
point  problem.  We  show  how  to  approximate  the  solution  to  this  fixed 
point  problem  by  linear  programming  methods.  Two  methods  are  given; 


\WWi>WUWV*Vrt)IAh*«Ui4.vluViUtMUMV»v.'V  »'  . .  '  '  . . 
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one  iterates  on  the  strategies  and  the  other  iterates  on  the  payoffs. 

To  introduce  move  realism  into  the  game,  several  variants  of 
the  stochastic  game  are  considered.  One  of  these  variants  is  the  ex¬ 
tension  of  C'harnes'  notion  of  a  constrained  game  to  stochastic  games. 
We  also  examine  a  stochastic  game  with  perfect  information  which  is 
a  terminating  Markovian  decision  process  and  we  extend  some  of  the 
known  results.  Finally,  we  introduce  a  finite  version  of  the  stochastic 
game  and  show  how  our  linear  programming  methods  may  be  used  to 
obtain  a  solution.  All  of  the  above  models  are  examined  in  light  of 
their  tactical  consequences  and  their  applications  to  hunter -killer 
operations. 

We  turn  to  a  different  type  of  problem  in  Chapter  5  than  those 
considered  thus  far.  Here  we  are  concerned  with  distributing  hunter - 
killer  forces  to  multiple  contacts.  Four  models  are  formulated  start¬ 
ing  with  simple  situations  and  progressing  to  a  dynamic  problem.  The 
first  two  models  deal  with  the  allocation  of  a  fixed  number  of  units  to 
two  or  more  contact  areas.  A  specified  mission  is  to  be  accomplished 
in  each  area  bat  the  number  of  units  required  for  this  purpose  is  a 
random  variable.  This  random  requirement  may  arise  due  to  insuf¬ 
ficient  intelligence  on  enemy  capabilities  and  objectives  or  other 
uncertainties. 

In  the  first  model,  we  introduce  a  novel  objective  function. 


M. 
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The  objective  is  to  maximize  the  probability  that  all  requirements  are 
met  or  equivalently  maximize  the  probability  that  al.l  missions  are 
simultaneously  accomplished.  The  constraint  set  consists  of  a  single 
constraint  on  the  total  amount  of  effort  available  and  non -negativity  re¬ 
strictions.  We  develop  an  algorithm  to  find  the  analytic  solution  when 
the  requirements  are  uniformly  and  independently  distributed.  The 
second  model  is  obtained  by  taking  the  following  objective:  minimize 
the  sum  of  the  expected  shortages.  The  resulting  model  is  a  problem 
in  generalized  constrained  medians  as  discussed  by  Charnes,  Cooper, 
and  Thompson  [2].  We  find  that  the  assumption  of  uniformly  and  inde¬ 
pendently  distributed  requirements,  in  this  case,  leads  to  a  quadratic 
programming  problem. 

Next,  we  examine  tactical  situations  where  the  distribution  time 
is  an  important  measure  of  effectiveness.  Here  we  obtain  a  chance- 
constrained  distribution  (transportation)  mode*.  The  availabilities  are 
known  bv?  again  the  requirements  for  a  specified  mission  in  each  contact 
area  are  discrete  random  variables.  The  deterministic  equivalent  for 
this  problem  has  discrete  availabilities  and  requirements  and,  therefore, 
non-integer  distribution  techniques  may  be  employed  to  obtain  an  op¬ 
timal  integer  solution. 

The  final  model  is  simply  a  dynamic  two-period  version  of  the 

*  above  distribution  model.  Here,  we  employ  a  zero-order  decision  rule 

* 
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for  both  periods;  and  a  method  is  given  to  allow  the  second  period  al¬ 
locations  to  depend  on  the  requirements  observed  in  the  first  period. 

In  addition,  we  indicate  how  a  linear -decision  rule  can  be  applied  to  this 


type  of  model. 


i 


1.  4  A  Survey  of  Mathematical  Programming 

Relevant  topics  to  this  work  in  mathematical  programming  are 
surveyed.  This  survey  is  intended  for  the  well-versed  reader  and 
only  an  orientation  to  several  important  topics  is  desired.  Therefore, 
the  treatment  is  brief  and  no  extensive  literature  citations  are  included. 
We  cover  the  following  topics:  the  transportation  problem;  linear 
programming,  including  duality;  and  some  aspects  of  chance-constrained 
programming.  Only  the  models  and  the  main  theorems  are  presented. 
The  reader  is  referred  to  the  literature  for  a  discussion  of  standard 
solution  techniques  such  as  th-i  simplex  method. 

The  modern  form  of  the  transportation  problem  was  first  formu¬ 
lated  and  studied  by  Hitchcock  [1],  although  even  more  general  forms  of 
this  problem  were  studied  as  early  as  1939  (Kantorovich),  but  were  not 
available  until  some  years  after  World  War  II.*  The  transportation 
model  may  be  visualized  by  supposing  that  there  are  m  shipping  points 
(origins)  with  a^  units  available  at  origin  i  (i  =  1,  .  .  .  ,  m)  and  r*  desti¬ 
nation  with  bj  units  required  at  destination  j  (j  =  1 . n).  Units  can 

be  shipped  from  each  origin  to  any  destination  and  a  shipping  cost  of 
cjj  is  incurred  when  one  unit  is  shipped  from  origin  i  to  destination  j. 

The  problem  is  vo  find  a  shipping  schedule  (number  of  units  to  be  sent 


see  Charnes  and  Cooper  [5]  for  an  extensive  discussion  of 
historical  developments  and  early  work. 
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from  each  origin  to  each  destination)  which  minimizes  tot'd  cost. 
Accordingly,  we  let  x^j  be  the  number  of  units  to  be  shipped  from 
origin  i  to  destination  j .  Then  the  mathematical  problem  (model)  may 
.be  stated  as  follows.  Find  the  values  of  x. .  which  ^ 

Min  £  c..x. . 
i,j  ^ 


Subject  to: 


(2a)  £  x.  =  a. 

j  1 

(2b)  £  x..  =  b. 

•  ij  J 


x.  aO 
U 

Constraints  (2a)  and  (2b)  require  that  the  total  amount  sent  from  each 


origin  is  equal  to  the  amount  available  there  and  the  total  amount  sent 
to  each  destination  is  equal  to  the  requirement. 


The  following  well-known  properties  of  (2)  are  immediately 


displayed. 


(a)  Problem  (2)  has  an  optimal  solution  if  and  only  if 

?  ai  =  ?  bj 
i  J  J 

(b)  If  the  ai  and  b.  are  all  integers,  then  every  basic 

J 

3 

feasible  solution  to  (2)  has  integer-valued  variables. 


The  subscripts  vary  over  their  entire  range  when  the  range  is 
not  indicated. 

2 

See  Charnes  and  Cooper  [5]  for  further  elaboration. 

3 

A  basic  feasible  solution  has  nc  more  than  m  +  n  -1  (the  number 
of  linearily  independent  constraints)  positive  variables. 
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It  follows  that  at  least  one  optimal  solution  has  integer 
valued  variables  and  the  optimal  integer  solutions  may 
be  found  by  the  usual  non-integer  adjacent  extreme 
point  Methods , 

The  above  properties  are  utilized  in  Chapters  2  and  5.  We  also  en¬ 
counter  inequalities  in  the  constraints  (2a)  and  (2b),  but  we  show  how 
to  reduce  these  inequality  forms  to  the  standard  form  when  this  re- 
du<.  ion  is  needed. 

The  most  widely  used  method  to  solve  (2)  consists  of  three  steps. 

(a)  Find  a  basic  feasible  solution  (b.  f.  s. ). 

(b)  Evaluate  the  current  b.  f.  s.  for  optimality. 

(c)  If  the  b.  f.  s.  is  not  optimal,  move  to  another  b.  f.  s. 
which  decreases  the  value  of  the  objective  function  and 
return  to  step  (b). 

If  (2)  has  a  feasible  solution,  then  the  above  method  converges  to  an 
optimal  solution  in  a  finite  number  of  steps.  In  terms  of  geometry,  this 
method  is  an  adjacent  extreme  point  method;  and  the  key  mathematical 
property  which  makes  the  method  work  is  the  equivalence  of  basic 
feasible  solutions  and  extreme  points  of  the  convex  set  of  feasible  so¬ 
lutions.  A  more  detailed  discussion  of  these  ideas  would  lead  us  too  far 
afield  from  the  purpose  of  this  survey.  Therefore,  we  turn  to  a  brief 


discussion  of  linear  programming. 


Next,  we  define  a  linear  program  afad  then  give  the  dual  theo¬ 
rem  of  linear  programming,  A  linear  program  consists  of  a  linear 
objective  function  which  is  to  be  optimized  and  linear  constraints.  In 
addition,  non-negativity  restrictions  on  the  variables  are  usually  stated 
separately.  Every  linear  program  may  be  written  in  the  following  form 

Max  c*X 

(3)  AX  ^  b 

X  a  0 

where  c  and  X  are  n-vectors,  b  is  an  m-vector,  A  is  an  mxn  matrix, 
and  c,  b,  and  A^all  have  constant  elements.  The  set  S  =  [X  |  AX  £  b, 
XaO}  is  called  the  set  of  feasible  solutions.  The  problem  is  to  find 

A  1  A 

an  XcS  which  maximizes  crX  over  all  X«S,  If  such  an  X  exists  and  is 

» 

* 

finite,  then  we  call  X  an  optimal  solution  to  (3).  Not  every  linear 
program  has  an  optimal  solution  but  exactly  one  of  the  following  three 
cases  must  occur: 

(a)  No  feasible  solution  exists, 

(b)  An  optimal  solution  exists. 

(c)  ctX  is  unbounded  for  some  XcS. 

As  in  the  transportation  problem,  the  methods  which  are  avail¬ 
able  to  solve  a  linear  program  depend  on  the  equivalence  of  extreme 
points  and  basic  feasible  solutions,  and  on  the  optimality  of  at  least  one 
basic  feasible  solution,  if  an  optimal  solution  exists.  Of  these  adjacent 
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extreme  point  methods,  the  most  widely  used  methods  are  the  simplex 
method  of  Dantzig  [2]  and  the  dual  method  of  Lemke  [1].  The  simplex 
method  consists  of  the  same  steps  as  those  outlined  for  the  transpor¬ 
tation  problem;  however,  the  means  of  going  from  one  step  to  the  next 
are  different  in  each  case.  It  starts  from  a  basic  feasible  solution 
(b.  f.  s. )  and  several  techniques  are  available  to  get  an  initial  b.  f.  s. 

With  a  starting  technique  and  the  simplex  method,  one  will  arrive  at  an 
optimal  solution  or  case  (a)  or  (c)  above  in  a  finite  number  of  steps. 

With  every  linear  program  there  is  associated  another  linear 
program  called  the  dual.  The  dual  to  problem  (3)  is  defined  as 

Min  Wtb 

(4)  WCA  2:  c 1 

W20 

There  are  certain  surprising  relations  between  the  solutions  to  (3)  and 
its  dual  (4).  These  relationships  are  summed  up  by  the' 

dual  theorem:  Solutions  to  the  primal  (3)  and  dual  (4)  are  related  as  follows: 

(i)  Problem  (3)  has  an  optimal  solution  if  and  only  if 
(4)  has  an  optimal  solution. 

(ii)  When  (3)  and  (4)  have  optimal  solutions  X  and  W 
respectively,  then  c*X  =  W^b, 

(iii)  If  either  (3)  or  (4)  has  an  unbounded  solution,  then 
the  other  problem  has  no  feasible  solution. 

Proof  -  see  Charnes  and  Cooper  [5]  for  details. 


In  addition,  it  is  possible  for  both  problems  to  have  no  feasible  solution. 
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Therefore,  if  a  linear  program  has  no  feasible  solutions,  its  dual  is 
either  unbounded  or  infeasible. 

The  practical  significance  of  duality  relationships  lie  .primarily 
in  the  sensitivity  information  which  is  available.  It  can  be  shown  that 

A 

is  the  change  in  the  primal  objective  function  per  unit  change  in  b^ . 

A 

But,  the  physical  interpretation  of  will  depend  on  the  actual  physical 

process  which  is  modeled.  Another  important  point  which  should  be 

made  here  is  that  an  optimal  solution  to  the  dual  problem  is  available 

when  the  primal  is  solved  by  the  simplex  method.  Hence,  sensitivity 

information  is  immediately  available. 

!  1 

Next,  we  give  the  theorem  of  the  alternative  which  follows  im¬ 
mediately  from  the  dual  theorem. 

A  A 

Theorem  of  the  alternative:  Suppose  X  and  W  are  optimal  solu- 

A  A 

tions  to  (3)  and  (4)  respectively.  Let  X  and  W_  be  the  slack  vectors 

s  s 

for  these  optimal  solutions.  Then 

A  f  A  A  4.  A 

WlX  =  0  ,  W  1  X  =  0 
s  s 

This  result  is  quite  useful  for  analysis  and  it  also  provides  additional 
primal-dual  interpretations.  We  continue  now  with  pertinent  topics 
from  chance -constrained  programming. 

The  idea  of  chance -constrained  programming  was  first  intro¬ 
duced  by  Charnes,  Cooper,  and  Symonds  at  the  December  1953  meeting 

*This  theorem  is  also  known  as  the  principle  of  complementary 
slackness. 
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of  the  Econometric  Society.  But,  four  years  of  refereeing  elapsed 
until  it  was  published  in  journal  form  in  [1]  1958),  Since  then  a  number 
of  problems  have  been  treated,  but  we  restrict  our  discussion  to  a 
definition  of  chance-constrained  programming  and  the  so  called  "zero- 
order  decision  rule".  This  rule  will  be  applied  to  a  transportation-type 
problem  in  Chapter  5. 

To  define  chance -constrained  programming  (ccp),  we  draw  on 

the  following  definition  given  by  Charnes  &  Cooper  [4j. 

"Chance -constrained  programming  admits  random  data 
variations  and  permits  constraint  violations  up  to  specified 
probability  limits.  Different  kinds  of  decision  rules  and 
optimizing  objectives  may  be  used  so  that,  under  certain 
conditions,  a  programming  problem  (not  necessarily 
linear)  can  be  achieved  that  is  deterministic  -  in  that  all 
random  elements  have  been  eliminated.  " 

The  reader  may  refer  to  Charnes  and  Cooper  [4]  and  subsequent  papers 

for  some  general  formulations  of  a  ccp.  We  restrict  our  attention  here 

to  the  following  problem  ("zero-order  decision  rules"): 

(5)  Max  E  (cfcX) 

(5.1)  Pr  (AX  £b  }  ad 

where  "E"  denotes  the  expectation  operator,  c  and  X  are  n-vectors,  b 
and  d  are  m-vectors,  and  A  is  an  mxn  matrix  of  constants.  We  assume 
that  b  is  a  vector  of  random  variables  with  a  known  continuous  joint 
cumulative  distribution  function  (c.  d.  f. ).  We  further  assume  that  c  is 
a  vector  of  random  variables  with  known  and  finite  means.  The  double 


inequality  in  (5.  1)  reads  as  follows:  The  probability  that  AX  5b  is  no 
less  than  d.  Hence  the  constraint  AX  £b  can  be  violated  but  it  must  be 
satisfied  with  at  least  joint  probability  d. 

We  rewrite  (5.  1)  as 

(6)  Pr  (atX  Sbi)  a  dt  i  =  1,  ...,m 

tli  th 

where  a.  is  the  i  n  row  of  the  matrix  A  and  b.  and  d.  are  the  i  com- 
1  xi 

ponents  of  b  and  d  respectively.  Let  F.  be  the  marginal  c.  d.  f.  of  the 
random  variable  b;.  Then  {6)  is  equivalent  to 

(7)  1  -  F.(a.X)  idj  i  =  1,  ...  ,m 

provided  we  are  using  zero-order  decision  rules,  i.  e.  ,  X  is  not  a 
function  of  the  random  variables  and  A  is  a  constant  matrix.  Since  F^ 
is  monotone  increasing  and  continuous,  F  inverse,  F.“*  ,  exists.  It 
follows  immediately  that  X  satisfies  (7)  if  and  only  if  X  satisfies 

afX  sFi'1  (1  -  d.)  i  =  1,  ...  ,  m 
Because  of  this  relationship,  (5)  is  equivalent  to 

Max  cfcX 


(S)  , 

ajXsFj  (1  -  d.)  i  =  1,  ...  ,  m 

where  c  is  the  vector  of  mean  values  of  c.  The  above  linear  program 
is  called  the  deterministic  equivalent  of  (5}r  From  the  linearity  of  (8), 
all  of  the  relationships  of  linear  programming  including  duality  can  be 
brought  to  (8)  or  equivalently  to  (4), 
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In  multiperiod  models  it  is  often  desirable  to  determine  the 
optimal  value  of  X  adaptively.  This  is  to  say,  X  should  depend  on  the 
actual  values  of  b  which  are  observed.  This  dependence  gives  rise  to 
the  notion  of  a  decision  rule  as  defined  by  Charnes  and  Cooper  [5]  and 
extensively  studied  in  Charnes  and  Kirby  [1],  The  latter  prove  the 
optimality  of  piece -wise  linear  decision  rules  for  the  E-model.  Charnes 
and  Cooper  have  particularly  studied  the  class  of  linear  decision  rules 

X  =  Db  +  a 

where  the  elements  of  the  matrix  D  and  the  vector  a  are  urknown 
constants.  These  constants  are  to  be  determined  by  reference  to  (5). 

The  above  relationship  for  X  is  substituted  into  (5)  and  Charnes  and 
Cooper  [5]  then  obtain  a  deterministic  equivalent  convex  programming 
problem  when  the  random  variables  b  are  normally  distributed.  Solving 
this  deterministic  equivalent  yields  an  optimal  D  and  a  which  in  turn 
specifies  an  optimal  X  for  each  observed  b,  via  the  above  linear  rule. 
Additional  material  on  chance -constrained  programming  may  be  found 
in  the  references  listed  under  Charnes,  et.  al. 
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1.  5  Elements  of  Game  Theory 

We  discuss  certain  elementary  concepts  from  the  theory  of 
games  which  will  be  used  extensively.  The  following  topics  are  con- 
sidered  in  turn:  definitions  for  a  game,  minimax  theorem,  linear 
programming  formulation,  extensive  form,  and  Kuhn's  theorem  of 
perfect  recall.  The  first  formulation  of  the  modern  theory  of  games 
and  the  minimax  theorem  was  given  by  von  Neumann  [2].  Subsequently, 
von  Neumann  and  Morgenstern  [1]  brought  the  theory  to  a  high  state  of 
development.  Additional  contributions  are  scattered  throughout  the 
literature.  However,  a  large  number  of  these  contributions  are  con¬ 
tained  in  the  Frinceton  series  of  "Contributions  to  the  Theory  of  Games" 
and  the  Proceedings  of  the  National  Academy  of  Sciences  (U.  S.  A. ).  In 

addition,  several  books  on  the  theory  of  games  are;  von  Neumann  and 

. 

Morgenstern  [1] ,  Blackwell  and  Girshick  [1] ,  Dresher  [1],  Karlin  [1], 
Luce  and  Raiffa  [1] ,  and  McKinsey  [  1] . 

In  order  to  provide  a  common  ground  for  discussion,  it  is  neces¬ 
sary  to  introduce  several  definitions. 

1.  A  game  is  defined  by  the  totality  of  its  rules. 

2.  A  play  of  the  game  is  one  complete  execution  of  the 
of  rules. 

3.  A  move  is  defined  as  a  point  in  the  game  when  one 


of  the  players  must  choose  an  alternative. 
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4.  An  altei  native  is  one  of  the  choices  which  a  player 
may  take  when  it  is  his  move. 

The  rules  of  a  game  distinguish  one  game  from  another.  They  specify 
the  sequence  in  which  the  players  move,  the  amount  of  information 
which  is  available  to  each  player,  what  the  payoffs  are,  how  a  play 
terminates,  and  the  alternatives  which  are  available.  The  rules  deter¬ 
mine  a  payoff  to  each  player  in  the  following  way:  Let  M^, 
represent  the  sets  of  alternatives  at  the  moves  in  a  game  and  let 
a  -  (a, ,  a?  ,  ...  ,a  )  be  a  sequence  of  alternatives  with  a-*M.  (i  =  1, 

...»  M).  Then  the  sequence  a  is  a  play  of  the  game.  Suppose  there  are 

n  players  denoted  by  P.,  P  ,  .  .  .  ,  P  .  Now  the  rules  specify  a  set  of 

l  2  n 

functions  {F.(a)  j  =  1,  ...  ,n}  for  each  play  a  with  F.(a)  being  the  pay- 
J  J 

off  received  by  P. .  If  for  some  a,  F.(a)  a  0  then  P.  receives  the  amount 
3  3  3 

F.(o}  and  if  F.  (a)  <  0  then  P.  pays  the  amount  F .(a).  A  game  is  called 
J  J  j  J 

n 

zero-sum  if  £  F  (o)  =  0  for  each  a.  In  the  remainder  of  t'lis  discussion 
j=i 

we  restrict  our  attention  to  two-person  (n  =  2)  zero-sum  games.  These 
games  describe  situations  of  conflict  between  two  opposing  interests  and 
they  are  used  exclusively  to  model  the  tactical  situations  under 
consideration. 

von  Neumann  and  Morgenstern  [1]  introduced  the  useful  notion 
of  the  extensive  form  of  a  game.  In  this  form,  a  game  is  viewed  as  a 
tree  consisting  of  nodes  and  branches.  A  node  corresponds  to  a  move 


26 


for  a  particular  player  and  the  branches  emanating  from  a  node  repre¬ 
sent  the  alternatives  which  are  available.  The  origin  of  the  tree 
corresponds  to  the  first  move  and  successive  nodes  correspond  to 
successive  moves.  Then  each  play  is  represented  by  a  unicursal  path 
from  the  origin  of  the  tree  to  a  terminal  branch  and  each  terminal 
branch  corresponds  to  precisely  one  play  of  the  game.  We  illustrate 
these  ideas  with  the  following  example: 


Example  1 

The  number  beside  each  node  designates  which  players'  move  it  is.  The 
dotted  lines  define  information  sets.  Roughly  speaking,  an  information 
set  tells  -sif  what  a  player  knows  at  each  move.  A  player  will  know 
which  information  *?et  he  is  in,  but  not  which  node  he  is  at.  To  illustrate 
the  concept  of  information,  >ve  give  the  following  scenario  of  the  above 
game.  PI  moves  first  and  chooses  one  of  three  alternatives.1  The 
information  sets  for  the  next  move,  P2's  move,  tell  us  if  PI  chooses 

*For  convenience,  we  number  alternatives  in  a  clock-wise 
direction.  "* 
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alternative  1;  then  P2  is  informed  of  this  but  if  PI  chooses  alternative 
2  or  3,  then  P2  is  not  informed  of  the  specific  alternative  chosen  by  PI. 
Now  on  Pi's  second  move,  PI  is  forced  tc  forget  whether  he  had 
chosen  alternative  1  or  2  on  the  first  move  but  PI  remembers  whether 
he  had  taken  alternative  3  or  no+  Further,  PI  does  not  know  which 
alternative  is  taken  by  P2. 

We  define  a  pure  strategy.  A  pure  strategy  for  a  player  is  a 
function  from  the  set  of  all  possible  histories  of  the  game  into  the  sets 
of  alternatives.  It  is  a  specification  of  an  alternative  at  each  move  for 
each  possible  history  up  to  the  move.  In  example  1,  player  2  has  eight 
pure  strat  :gies  corresponding  to  the  eight  ways  to  map  the  set  {1,  2,  3}, 
into  the  set  {1,  2-}.  Each  of  these  ways  is  a  specification  ITflfieTirf- 
ternative  P2  should  choose  (1  or  2}  depending  on  what  PI  chooses 
(1,  2,  or  3).  T-r.  a  similar  way  we  can  enumerate  the  pure  strategies  *or 
PI.  Here  a  pure  strategy  tells  PI  what  alternative  to  choose  for  both 
his  moves  as  a  function  of  the  history  of  the  game, 

von  Neumann  and  Morgenstern  [1]  have  shown  that  every  finite 
two-person  r.ero-sum  game  can  be  reduced  to  normal  form.  In  normal 
form  the  game  is  represented  as  an  mxn  matrix  A  with  each  row  corre¬ 
sponding  to  a  pure  strategy  for  PI  and  each  column  corresponding  to  a 
pure  strategy  for  P2.  If  PI  chooses  row  i  and  P2  chooses  column  j  then 
PI  receives  a-,  lrom  P2  and  since  the  game  is  zero-sum,  P2  receives 
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-a^  from  Pi.  We  refer  to  these  games  as  matrix  games,  and  A  is 
called  Pi's  payoff  matrix.  With  the  game  in  normal  form,  we  now 
think  of  it  as  consisting  of  one  move  by  each  player  with  the  pure 
strategies  being  alternatives.  Now  either  player  may  move  first,  and 
the  second  player  to  move  is  in  ignorance  of  the  alternative  chosen  by 
the  first  player;  or  equivalently  the  players  may  move  simultaneously 
in  ignorance  of  the  other's  move. 

The  normal  form  is  convenient  for  a  discussion  of  rational  play 
and  mixed  strategies,  von  Neumann's  [2]  concept  of  rational  play  re¬ 
quires  that  each  player  maximize  his  minimum  expected  payoff  or 
simply  minimax  the  expected  payoff.  To  get  to  the  heart  of  this  matter, 
we  introduce  mixed  strategies  and  the  minimax  theorem  of  von  Neumann.. 
[2],  A  mixed  strategy  for  a  player  is  a  probability  distribution  over  the 
available  alternatives  (pure  strategies).  Accordingly,  we  assume  PI 
plays  alternative  i  with  probability  x.  (i  =  1,  ...  ,  m)  and  P2  plays  his 
alternative  j  with  probability  Yj  (j  =  1>  •  •  •  > n)*  We  call  the  mxl  vector 
X  =  (Xj,  .  . .  ,  x^)  and  the  nxl  vector  Y  =  (y^,  ...»  yn)  mixed  strategies 
for  PI  and  P2  respectively.  Since  X  and  Y  are  probability  distributions, 
we  must  have 

m  n 

2  x.  =  i ,  x.  a  0  ;  S  y.  =  1 ,  y.  *  0 
i=l  1  1  j=l  3  J 

We  let  E{X,  Y)  =  X*AY,  whe^e  A  is  Pi's  payoff  matrix.  Then  E(X,  Y) 
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is  the  expected  payoff  to  PI  when  PI  chooses  X  ana  P2  chooses  Y.  The 
following  remarkable  theorem  consolidates  the  theory  of  matrix  games. 

Minimax  theorem  (von  Neumann).  For  every  matrix  A,  there 
exists  strategies  X  and  Y  such  that 

(9)  E  (X,  Y)i  E  (X,  Y)SE  (X,  Y)  all  strategies  X  and  Y  . 

The  strategies  X  and  Y  are  called  optimal  strategies  for  PI  and  P2 
respectively,  and  v  =  E(X,  Y)  is  termed  the  value  of  the  game. 

We  can  immediately  interpret  the  meaning  of  an  optimal  strategy 

A 

and  equivalently  rational  play.  From  equation  (9),  if  PI  plays  X, 
then  he  receives  at  least  v  regardless  of  the  strategy  P2  employs. 

a 

Furthermore,  P2  can  prevent  PI  from  getting  more  than  v  by  playing  Y  . 
Hence,  a  player  can  gain  nothing  by  deviating  from  an  optimal  strategy 
and  he  can  lose  more  if  he  does  deviate  from  an  optimal  strategy.  The 
minimax  theorem  settles  important  questions  of  the  theory  but  it  does 
not  tell  us  how  to  compute  optimal  strategies. 

Next,  we  show  how  optimal  strategies  may  be  computed  by  linear 
programming.  Consider  the  following  dual  pair  of  linear  programs1 


1 


This  formulation  is  a  variant  of  the  one  in  Charnes  [1]- 
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Max  v 


Min  u 


m  n 

v  -  L  x.a..  j£  0  ;  j=l,  . .  .  ,  n  u  -  2  a::Vi  2  0  5  i=l .  .  .  ,  m 
i=l  1  1J  j=l  13  3 


(10)  2  x  =  1  (11)  J  y  si 

i=l  j=l  J 

X,  <■.  0  i=l,  . . .  ,  m  y.  2:  0  j=l,  .  . .  ,  n 

^  J 

In  (10)  and  (11)  the  matrix  A  =  (a_)  is  to  be  interpreted  as  Pi's  payoff 
matrix  and  PI  is  the  maximizing  player.  Charnes  [1]  has  shown  that 
optimal  solutions  X  =(x^,  ....  xj  ,  Y  =  (y^,  .  . .  ,  yn)  and  u  =  v  c.xist 
for  (10)  and  (11)  and  that 

XtAY  j£  J^AY  s  ^AY  all  strategies  X  and  Y. 

Therefore,  optimal  solutions  to  (10)  and  (11)  correspond  to  optimal 
strategies  in  the  matrix  game  A.  Further,  when  either  (10)  or  (11) 
is  solved  the  optimal  solutions  to  the  other  progra?n  are  available. 
Therefore,  the  value  and  optimal  strategies  may  be  found  by  solving 
a  single  linear  program. 

To  avoid  confusion  later,  we  emphasize  that  P2  does  not  have 
to  receive  the  negative  of  the  payoff  that  PI  receives  in  order  for 

v 

zero-sum  theory  to  apply.  Indeed,  we  do  not  postulate  negative  pay¬ 
ments  later  when  the  payoff  is  taken  to  be  a  probability  or  a  unit  of 
time.  Nevertheless,  zero-sum  theory  applies  because  one  player  is 
attempting  to  maximize  the  expected  payoff  and  the  other  player  seeks 
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to  minimize  it  -  this  is  all  that  is  really  necessary. 

When  games  are  formulated  directly  in  the  matrix  form,  zero- 
sum  theory  requires  that  each  player  be  in  ignorance  of  the  other  player's 
choice.  In  most  real-world  situations  and  particularly  in  tactical  en¬ 
counters,  such  total  ignorance  does  not  prevail.  Some  form  of  intelli¬ 
gence  or  habits  of  the  opposition  are  usually  known.  To  incorporate  this 
type  of  information  and  other  types,  we  utilize  the  constrained  game 
formulation  of  Charnes  [1].  This  approach  is  employed  in  Chapter  2, 
and  we  give  a  complete  discussion  of  it  there. 

We  return  to  the  extensive  game  form  to  discuss  perfect  infor¬ 
mation  and  perfect  recall.  A  game  has  perfect  information  if  each 
information  set  contains  exactly  one  node.  This  means  that  when  each 
player  moves  he  must  know  the  complete  history  of  the  game  including 
the  other  player's  moves.  Of  course,  it  is  well-known  that  there  exist 
optimal  pure  strategies  for  a  game  with  perfect  information.  *  Intui¬ 
tively  speaking,  when  a  game  has  perfect  information,  randomizing  is 
not  necessary  to  hide  a  player's  choice  since  it  will  be  disclosed  to  the 
other  player  in  subsequent  moves.  For  example,  checkers  is  a  game 
with  perfect  information. 

The  concept  of  perfect  recall  was  introduced  by  Kuhn  [1J,  and 

*  Refer  to  von  Neumann  [2]. 
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it  is  an  extension  of  the  notion  of  perfect  information.  The  fundamental 
result  is  that  a  behavior  strategy  is  optimal  in  a  game  of  perfect  recall. 
Vaguely  speaking,  a  game  has  perfect  recall  if  every  player  remembers 
which  alternative  he  took  in  all  preceding  moves.  However,  he  does 
not  need  to  be  informed  of  the  alternatives  which  were  chosen  by  the 
other  players. 

To  illustrate,  we  can  change  example  1  to  a  game  of  perfect  re¬ 
call  by  redefining  the  information  set^,  for  PI  on  his  second  move  as 
follows: 


Now  on  Pi's  second  move  he  remembers  which  alternative  he  has  taken 
on  his  first  move  (Of  course,  there  also  are  other  ways  to  introduce 
perfect  recall  into  this  game. ). 

To  define  a  behavior  strategy,  we  assume  that,  say  PI,  has  n 
information  sets  and  we  let  be  a  mixed  strategy  over  the  alternatives 
available  in  information  set  i(i  =  1,  ...  ,  n).  Then  X  =  {  X^  .X^, . .  .  ,  Xq  ) 
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is  a  behavior  strategy  for  PI.  In  the  above  example,  a  behavior  strat¬ 
egy  for  P  L  is  X  =  {Xj,  X^,  X3,  X4}  where  X^  ie  a  mixed  strategy  over 
the  information  set  UL  (i  =  1,  2,  3,  4).  For  example,  X3  =  (a,  1-a), 
where  a  is  the  probability  of  choosing  alternative  1  in  .  Since  P2 
only  has  one  move,  his  behavior  strategy  is  the  same  as  his  mixed  strat¬ 
egy.  It  is  easy  to  construct  games  wb  •  re  one  can  do  better  with  a  mixed 
strategy  than  with  a  behavior  strab-gy.  It  has  been  shown  by  Kuhn  [1], 
on  the  other  hand,  that  a  game  of  perfect  recall  always  has  optimal 
behavior  strategies. 

This  concludes  our  brief  discussion  of  game  theory  and  our 
introductory  chapter.  We  turn,  to  the  development  of  models  arief 


methods  for  ASW  tactics. 
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CHAPTER  II  -  NON-SEQUENTIAL  SEARCH  GAMES 

2.  1  Introduction 

Two  deterministic  search  games  are  developed  in  this  chapter. 
These  games  are  idealizations  of  tactical  situations  which  arise  in  anti¬ 
submarine  warfare.  We  study  searching  problems  where  a  hunter -killer 
force,  PI  (player  1),  and  a  submarine,  P2  (player  2),  are  in  direct  con¬ 
flict.  Specifically,  we  consider  tactical  problems  where  PI  attempts 
to  detect  P2  and  P2  attempts  to  avoid  detection.  Because  of  the  op¬ 
posing  military  objectives  of  detection  and  evasion,  these  tactical 
problems  may  be  formulated  as  two -person  zero-sum  games.  An 
appropriate  payoff  function  is  defined  to  reflect  the  objectives  of  de¬ 
tection  and  evasion,  and  we  show  how  optimal  strategies  correspond  to 
optimal  deployment  plans. 

We  also  consider  constrained  game  extensions  of  the  basic  search 
games  and  thereby  allow  secondary  military  objectives  in  addition  to  the 
primary  objectives  of  detection  and  evasion.  In  addition,  these  con¬ 
strained  games  permit  the  players  to  choose  optimal  strategies  based 
on  intelligence  or  information  on  the  opposing  players’  tactics  derived, 
perhaps,  from  previous  attacks.  Hence,  optimal  strategies  employed  in 
a  particular  play  may  depend  on  actual  information  obtained  from 
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previous  encounters. 

Examples  are  given  of  particular  tactical  situations  which  are 
encompassed  by  the  games,  and  these  examples  serve  to  illustrate 
additional  features  of  the  models.  Special  cases  are  also  treated;  and 
in  section  2.  6,  we  obtain  an  analytic  solution  for  a  special  case  of  the 
first  game.  Both  of  the  games  are  shown  to  be  generalizations  of 
search  games  introduced  by  von  Neumann  [1]. 
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.  2.  2  Formulation  of  the  n-Ceil  Game 

The  n-cell  search  game  is  played  within  a  specified  search 
region  which  is  apportioned  into  n  cells  numbered  i  =  1,  .  .  .  ,  n.  A  pure 
strategy  for  PI  is  a  cell  to  search,  and  a  pure  strategy  for  P2  is  a  cell 
in  which  to  hide.  Hence,  each  player  has  n  pure  strategies;  one  cor¬ 
responding  to  each  cell.  A  play  of  the  game  consists  of  a  simultaneous 
choice  of  strategies  by  the  players.  Of  course,  the  same  game  obtains 
if  the  players  choose  their  strategies  successively  provided  that  the 
second  choice  is  made  in  ignorance  of  the  first. 

Now  we  define  an  objective  and  an  appropriate  payoff  function. 

The  primary  mission  of  a  hunter -killer  force  is  to  seek  and  destroy 
submarines,  Sternhell  and  Thorndike  [  1] .  We  deal  here  with  the  seeking 
aspect  of  hunter -killer  operations,  and  focus  our  attention  on  tactical 
situations  where  PI  attempts  to  detect  P2  while  P2  attempts  to  avoid 
detection.  Hence,  a  reasonable  measure  of  effectiveness  for  each  pair 
of  fixed  strategies  is  the  probability  that  PI  detects  P2.  To  formulate 
this  measure  of  effectiveness,  we  postulate  the  following  payoffs.  Let 
p..  (i,  j  =  1,  ...  ,  n)  be  the  conditional  probability  that  PI  detects  P2 
given  PI  searches  cell  i  and  P2  hides  in  cell  j  ;  and  let  P  be  the  n  x  n 


matrix  P  =  (p  ).  Let  the  n  x  1  ^/ectors  X  =  (x . x  )  and 

ij  in 


Y  =  (y^,  . ...  ,  y^)  be  mixed  strategies  for  PI  and  P2  respectively.  Now 
x.  is  the  probability  that  PI  searches  cell  i  and  y„  is  the  probability  that 
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that  P2  hides  in  cell  j.  Thus,  the  probability  that  F  ^ "tects  P2  for  the 
mixed  strategies  X  and  Y  is  simply 
(1)  X£FY 

Equation  (1)  is  the  desired  measure  of  effectiveness  for  the  type  of 
hunter -killer  operations  under  consideration.  We  assume  that  PI  is  the 
maximizing  player  and  P2  the  minimizing  player. 

The  celebrated  minimax  theorem  of  von  Neumann  [2]  establishes 

A  A  _ 

the  exsistence  of  strategies  X  and  Y  and  a  real  number  v  (the  value  of 
the  game)  which  satisfy  the  equation 

(2)  .  X*PY  £  v  =  X£PY  j£  X£PY  all  strategies  X  and  Y 

or  we  may  also  write  * 


(3) 


A 

V 


max  min  vtov 

Jr  Y  '-A-  tr  I 


min  max 
Y  X 


X£PY  =  X*PY 


*  A 

From  equation  (2),  if  PI  plays  an  optimal  strategy  X,  then  the  total  pay¬ 
off  (probability  of  detection)  is  at  least  as  great  as  v  regardless  of  P2's 
strategy.  Similarily,  if  P2..plays  an  optimal  strategy  Y,  the  total  payoff 
is  no  greater  than  v  regardless  of  .Pi's  strategy.  It  follows  that  PI  can 
choose  a  strategy  to  maximize  the  probability  of  detection  while  P2 
simultaneously  minimizes  it.  Hence,  the  conflicting  objectives  of  de¬ 
tection  and  evasion  are  embodied  in  the  given  two-person  zero-sum 
game  formulation. 

An  important  feature  of  the  n-cell  search  game  is  that  PI  may 
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detect  P2  with  positive  probability  from  anywhere  in  the  search  region. 
This  feature  permits  us  to  consider  tactical  situations  in  which  the 
probability  of  detection  is  a  function  of  the  range  between  PI  and  P2. 
Variation  of  detection  probability  with  range  is  a  basic  property  of  de¬ 
tection  devices,  *  but  none  of  the  search  models  referenced  in  the 
bibliography  permit  this  variation.  They  assume  either  explicitly 
or  implicitly  that  the  radius  of  detection  is  negligible.  This  assumption 
for  the  n-cell  game  requires  P  to  be  a  diagonal  matrix,  and  we  ta!:e  up 
this  special  case  in  section  2.  6. 


*See  Morse  and  Kimball  [1], 
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2,  3  Solution  and  Tactical  Interpretation 

We  present  a  method  to  compute  the  solution  to  the  n-cell  search 
game.  No  computational  advantage  is  gained  from  the  fact  that  the  pay¬ 
off  elements  are  probabilities.  Hence,  we  employ  a  computational 
method  developed  for  a  general  two-person  zero-sum  game. 

Optimal  strategies  and  the  value  may  be  computed  and  are 
characterized  by  the  following  dual  linear  programs  due  to  Charnes  [1], 
Max  v  Min  u 


(4) 


n 

v  -  £  xiPii  5  0  >  J=1>  •  •  •  >  n 
i=l  J 


a  -  «  Pijyj2  0  ;  i=1" 
j=l 


i=l  =  * 


(5)  £  y.  =  1 

j=l  J 


2  0  i=l, . .  . ,  n 


y.  2:  0  j  =  l,  .  .  .  ,  n 

J 


Let  X  =  (x  ,  . ,  .  ,  x  ),  Y  =  (y  ,  ...  ,y  ):  and  u  =  v  be  an  optimal  solution 
In  In 

to  problems  (4)  and  (5).  Then  from  Charnes  [1] 

X*PY  £  v  =  X*PY  *  X*PY  all  strategies  X  and  Y  . 


A  A  A 

Fence,  X  and  Y  are  optimal  strategies  and  v  is  the  value  of  the  game, 
jet  ^nation  (2)). 

We  bring  together  the  following  assumptions  which  have  been 
made  and  examine  them  in  light  of  their  tactical  consequences. 

1.  Both  players  know  the  game  is  being  played.  - 

2.  Both  players  are  given  the  search  region  and  the  particular 
subdivision  of  it  into  cells,  i.  e.  ,  they  know  what  pure 
strategies  are  available. 
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3.  Both  players  are  given  the  payoff  matrix. 

4.  PI  acts  to  maximize  the  probability  of  detecting 
P2  and  P2  minimizes  this  probability. 

We  study  the  above  assumptions  from  Pi's  point  of  view  when  PI  has  all 
of  the  information  necessary  to  satisfy  the  above  assumptions  and  P2 
has  part  or  perhaps  none  of  the  information.  Hence,  suppose  that  PI 
goes  out  and  specifies  a  search  region  of  interest  and  divides  it  into 
cells.  Now  PI  will  know  the  detection  characteristics  of  his  own 
searching  equipment  and  can  therefore  construct  a  payoff  matrix. 
Further,  assume  that  PI  wishes  to  minimaxthe  probability  of  detecting 
P2.  Then  the  above  assumptions  are  satisfied  for  PI.  Now,  it  is 
unlikely  that  P2  will  also  have  all  of  the  information  required  by  as¬ 
sumptions  1  through  4.  Thus,  due  to  ignorance  of  the  essentials  of  the 
game,  P2  may  not  play  an  optimal  strategy.  Nevertheless,  if  PI  plays 
an  optimal  strategy,  then  P2  is  detected  with  probability  at  least  as 
great  as  the  value  of  the  game.  For  practical  purposes,  it  is  therefore 
immaterial  whether  P2  has  all  of  the  information  required  by  as¬ 
sumptions  1  through  4.  The  important  point  is  that  when  PI  plays  an 
optimal  strategy  he  is  acting  as  if  P2  does  have  all  of  the  required  in¬ 
formation  and  P2  may,  in  fact,  have  a  substantial  amount  of  it. 

Consider  the  situation  where  PI  employs  an  ootimal  strategy  for 
several  plays  of  the  game  and  his  strategy  is  discovered  by  P2.  Then 
from  equation  (2),  P2  cannot  take  advantage  of  the  fact  that  he  has 


41 


discovered  Pi's  optimal  strategy  because  PI  will  detect  P2  with  proba¬ 
bility  at  least  as  great  as  the  value  of  the  game  regardless  of  P2's 
strategy.  Therefore,  PI  can  employ  the  same  optimal  strategy  through¬ 
out  several  plays  of  the  game  without  risking  adverse  consequences. 

Finally,  we  given  an  interpretation  of  optimal  strategies  for  PI  in 

terms  of  optimal  search  plans.  A  search  plan,  in  the  sense  used  here, 

is  a  specified  configuration  of  the  available  search  effort  or  equivalently 

the  amount  of  effort  which  is  to  be  assigned  to  each  cell.  If  PI  has  a 

single  unit  of  effort  which  is  indivisible,  then  he  may  play  the  optimal 

strategy  X  be  searching  cell  i  (i  =  1,  ...  ,  n)  with  relative  frequency  x. . 

Of  course,  these  relative  frequencies  may  foe  realized  over  several 

plays  of  the  game  by  selecting  a  pure  strategy  for  each  play  at  random 

* 

from  the  distribution  X.  Now  suppose  that  PI  has  a  total  amount  of 
effort  E  available  which  is  infinitely  divisible.  For  example,  E  may  be 
the  number  of  flying  hours  available  for  searching  which  is  approxi¬ 
mately  infinitely  divisible.  In  this  case,  PI  may  allocate  the  amount 
of  effort  x,E  to  cell  i  (i  =  1,  ...  ,  n),  and  this  allocation  is  optimal  with 
respect  to  the  game  model.  Hence,  an  optimal  search  strategy  may 
correspond  to  an  optimal  search  plan. 
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2.  4  Tactical  Examples 

We  give  a  tactical  example  of  the  n-cell  game  model.  Suppose 
that  submarines  (P2)  must  pass  through  a  channel  to  get  from  their 
bases  to  operating  areas.  PI  wishes  to  set  up  a  patrol  barrier  across 
the  channel  to  detect  submarines  as  they  pass  through.  *  The  patrol 
barrier  will  consist  of  a  linear  array  of  detection  devices  across  the 
channel.  Thus,  the  searching  region  is  a  straight  line;  and  PI  divides 
this  line  into  15  cells,  as  shown  in  Figure  2.  1.  PI  would  like  to  de¬ 
termine  an  optimal  allocation  of  detection  devices  to  maximin  the 
probability  of  detecting  P2. 

Each  detection  device  has  a  probability  of  detection  verses  range 

2 

curve  as  given  in  Figure  2.  2.  The  payoff  matrix  can  now  be  con¬ 
structed  from  Figure  2.  2.  For  example,  if  PI  searches  cell  5  and  P2 
hides  in  cell  8,  then  the  range  is  three  cells  and  from  Figure  2.  2 
p^2  =  0.  367.  The  complete  P  matrix  is  given  in  Table  2.  1. 

Now  we  have  all  of  the  information  required  to  solve  the  n-cell 
game.  The  value  and  optimal  strategies  for  this  example  were  found  by 
solving  linear  program  (4)  with  the  data  of  Table  2.  1.  A  CDC  1604  com 
puter  and  a  standard  linear  programming  code  were  used  to  effect  the 

*This  type  of  situation  was  encountered  in  the  Bay  of  Eiscay 
during  World  War  II,  Sternhell  and  Thorndike  [1]. 

2 

This  is  a  "typic?1."  curve  according  to  Morse  and  Kimball  [1], 


computations.  PI  has  the  unique  optimal  strategy  displayed  in  Table  2.  2 
and  plotted  in  Figure  2.  3.  P2  has  two  optimal  extreme  point  strategies, 
and  they  also  appear  in  Table  2.  2  and  Figure  2.  3. 

Let  Yj  and  Y£  denote  P2's  optimal  extreme  point  strategies. 

Then  from  linear  programming  theory,  the  strategy 

(6)  Y=1Y1  +  (1-\)Y2  OU^l 

is  also  optimal.  From  the  symmetry  of  the  payoff  matrix,  we  might 
expect  PI  and  P2  to  have  symmetric  optimal  strategies  about  the 
middle  cell  (cell  8).  Indeed,  Pi's  optimal  strategy  is  symmetric  about 
cell  8  and  for  \  =  ~  in  equation  (6);  P2  also  has  a  symmetric  optimal 
strategy  about  cell  8. 

Notice  that  a  unit  of  Pi's  effort  in  cell  1  or  15  has  only  one-half 
the  probability  of  detecting  P2  as  a  unit  in  cell  8.  However,  about 
sixty-threa  percent  of  Pi's  effort  is  assigned  to  cells  near  the  e-  Is  of 
the  search  region  (cells  2  and  14).  Then,  in  a  sense,  PI  compensates 
for  the  decreased  effectiveness  per  unit  in  the  end  cells  by  assigning  a 
large  percentage  of  effort  to  these  cells. 


Player  1 '  s  Payoff  Matrix 
Table  2.  1 


I 


I 

i 


( 


J 

yj 

1 

0.  258 

5 

0.  069 

6 

0.  165 

10 

0.  234 

15 

0.  274 

all  other  y.  =  0 


J 

yj 

1 

0.  274 

6 

0.  234 

10 

0.  165 

11 

0.  069 

15  J 

0.  258 

all  other  y.  =  0 


J 

2 

0.  317 

6 

0.  099 

8 

0.  168 

10 

0.  099 

14 

0.  317 

all  other  x.  =  0 


P2's  Optimal  Basic  Strategies 


Pi's  Optimal  Strategy 


Optimal  Strategies 
Table  2.  2 


0.  3 


0.  1 


Pi's  Optimal  Strategy 

Plots  of  the  Optimal  Strategies 
Figure  2.  3 
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Other  tactical  examples  which  fit  the  n-cell  model  are  also  im¬ 
mediately  available.  First,  the  search  region  may  be  a  rectangular 
array  of  cells.  In  this  case,  an  optimal  strategy  for  PI  is  likely  to  re¬ 
quire  PI  to  play  those  cells  on  the  edges  of  the  search  region  with  higher 
probability  than  the  center  cells.  Intuitively  this  would  prevent  P2  from 
hiding  in  the  edge  cells  where  the  detection  capability  is  lower  than  in 
the  center  cells. 

We  might  also  study  situations  where  the  search  region  is  three 
dimensional.  The  effect  of  the  depth  of  P2  on  the  probability  of  de¬ 
tection  can  thereby  be  taken  into  account.  If  PI  is  using  surface  de¬ 
tection  devices  then,  of  course,  Pi's  pure  strategies  would  include  only 
the  surface  cells,  and  P2  could  choose  any  cell  in  the  three-dimensional 
region.  This  is  a  slight  variation  on  the  n-cell  game  where  PI  and  P2 
do  not  have  identical  seis  of  pure  strategies. 

Next,  we  extend  the  n-cell  game  to  include  two  or  more  types  of 
detection  devices  for  PI.  For  example,  the  searcher  may  have  aircraft 
and  snips  available  and  the  probability  of  detection  verses  range  curve 
may  differ  significantly  between  ships  and  aircraft.  For  the  sake  of  dis¬ 
cussion,  suppose  that  only  two  types  of  detection  devices  are  to  be  used. 
Call  these  devices  type  1  and  type  2.  Now  the  detection  probability  of 

interest  is  the  joint  conditional  probability  p  .  Where  p  is  the 

ikj  ikj 

probability  that  PI  detects  P2  given  type  1  is  located  in  cell  i  (i=  1,  .  .  .  ,n), 
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type  2  is  located  in  cell  k  (k  =  1,  ...  ,  n)  and  P2  hides  in  cell  j 
(j  =  1,  .  .  .  ,  n).  This  game  can  be  placed  in  the  framework  of  the  n-cell 
game  by  simply  changing  the  pure  strategies  for  PI.  Let  a  pure  strat¬ 
egy  for  PI  be  the  two-tuple  (i,  k)  where  i  denotes  the  location  of  type  1 
and  k  denotes  the  location  of  type  2.  Now  we  may  construct  the  payoff 
matrix  and  solve  for  the  optimal  strategies  and  value  by  reference  to 
linear  programs  (4)  and  (5).  This  example  may  be  extended  tc  handle 
more  than  two  types  of  equipment. 


C 
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2.  5  Extension  to  a  Constrained  Game 

We  extend  the  n-cell  game  to  accomodate  the  following  types  of 
situations. 

1.  A  player  has  information  on  the  cells  which  his  opponent 
can  choose. 

2.  A  player  restricts  his  own  choice  of  cells. 

The  above  statements  are  necessarily  broad  to  include  a  variety  of 
tactical  problems.  Some  of  these  problems  are  outlined  below.  As  we 
shall  see,  cases  1  and  2  are  formally  embodied  by  the  elegant  notion  of 
a  constrained  game  due  to  Charnes  [1]. 

We  discuss  1  for  information  which  PI  may  have  on  P2's  location. 
Analogous  statements  hold  when  P2  has  information  on  PI.  Now 
suppose  PI  obtains  a  contact  with  P2  and  subsequently  loses  the  contact. 
Then  Pi  knows  that  P2  must  be  located  in  some  subset  I  of  the  set  of  all 
cells,  where  I  is  determined  by  the  position  of  the  last  contact  and  the 
elapsed  time  since  the  contact.  Hence,  the  following  constraints  on 
P2's  strategies  are  obtained: 

y.  =  0  for  jfl 

More  detailed  constraints  on  P2's  strategies  may'  also  be  written. 

Morse  and  Kimball  [13  give  a  theoretical  probability  distribution  of  P2's 

location  as  a  function  of  elapsed  time  since  the  last  contact.  From  this 

distribution,  we  can  calculate  the  bounds  L.  and  U.  with 

J  J 
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(7)  L  Sy  SU  j  =  1,  ...  ,n 

j  j  j 

The  above  restrictions  may  also  arise  from  intelligence  reports  or  Pi's 
apriori  estimates  of  P2's  location. 

We  discuss  two  extreme  cases  of  information  icr  PI: 

1.  perfect  information, 

2.  no  information. 

In  case  1,  PI  knows  the  strategy  which  P2  will  employ.  Of  course,  PI 
will  then  have  an  optimal  pure  strategy.  Furthermore,  perfect  infor¬ 
mation  corresponds  to  L.  =  U. ,  j  =  1,  ...  ,n,  in  equation  (7).  Case  2 

J  3 

is  the  unconstrained  n-cell  search  game.  Here  we  have  L  =  0  ,  U  =  1  , 

j  j 

j  =  1,  . .  .  ,  n,  in  equation  (7).  In  many  ASW  situations,  the  information 
which  is  available  will  be  between  these  two  extremes.  These  cases 
yield  to  a  constrained  game  formulation.  ' 

A  situation  where  PI  may  restrict  his  own  choice  of  cells  is 
when  he  has  a  secondary  military  objective  in  addition  to  the  primary 
objective  of  detecting  P2.  For  instance,  PI  may  wish  to  provide  at 
least  a  certain  level  of  protection  for  some  set  of  cells  I  in  the  search 
region  because  there  is  a  convoy  in  this  set  of  cells.  Then  constraints 
of  the  following  form  arise: 


u  x.  a  c 
itl  1  1 


where  is  the  desired  level  of  minirnimi  protection. 
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We  have  discussed  a  few  situations  in  which  constraints  on  the 


players'  strategies  arise  naturally.  These  situations  and  others  are 
included  by  the  following  sets  of  constraints  on  Pi's  and  P2's  stra¬ 
tegies  respectively 


(8) 


^  X  C  Sc 

i=l  1  18  8 


3=1, 


,S 


(9) 


n 


£  b  .  y .  a  b 

.  ,  rJ  J  r 
J=1 


r  =  1,  ...  , R 


The  following  constrained  formulation  due  to  Charnes  [l]  is 
employed  to  deal  with  the  types  of  tactical  situations  under  consideration. 
It  includes  the  formulated  constraints  (8)  and  (9). 


Max  v  +  £  z  b 
r  r  r 


Min  u  +  £  c  w 
s  s 
s 


v+  £  z  b  x.  p. .  S  0 

r  r  rj  r  irij 


u  +  £  c.  w  -  £  p..y.  a  0 
is  s  .  *11  ’ j 
s  J  J 


(10) 


£  x.  =  1  (11) 

i 


S  y.  =  1 
j  J 


L  X.  C.  Sc 
;  1  IS  S 


£  b  .  y.  a  b 
i  rj  r 


z  ,  x.  a  0 
r  i 


w  ,  y.  a  0 
s  J 


Let  quantitites  with  a  "hat"  over  them  denote  part  of  an  optimal  solution 
to  (10)  aau  (11).  Charnes  obtained  the  following  results: 


(12) 


v+  E  2  b  =  %  r  x.p.  y  =  u+  S 

r  i  i  J  J  q 


i  i 


c  w 
s  s 


( 
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(13)  Z  Z  Pij  y i  ^  £  z  x^p^y-  s  Z  Z  xiPi.y.  all  strategies  X  and  Y 

i  j  J  J  i  j  J  J  i  j  J  J 

Equations  (12)  and  (13)  establish  the  existence  of  a  value  and  optimal 

strategies  for  the  constrained  game.  Of  course,  the  value  and  optimal 

strategies  may  be  computed  from  the  above  linear  programs. 

We  compare  the  constrained  n-cell  game  to  the  unconstrained 
game.  Suppose  that  PI  can  impose  constrain' s  on  P2's  strategies  but 
there  are  no  constraints  on  Pi's  strategies.  Then  the  value  of  the  con¬ 
strained  game  is  no  smaller  than  the  unconstrained  value.  This  fact 
follows  from  problems  (10)  and  (11),  since  the  unconstrained  value  can 
be  attained  by  the  objective  function  of  (10)  with  all  z^  =  0.  Hence,  PI 
can  always  increase  the  probability  of  detecting  P2  if  he  can  determine 
constraints  on  P2's  strategies  without  imposing  constraints  on  his  own 
strategies.  Analogous  remarks  hold  for  player  2.  See  Charnes  and 
Cooper  [5]  and  Sakaguchi  [1],  [2],  for  more  details  on  this  subject. 

Finally,  we  discuss  an  adaptive  manner  of  employing  constrained 
games.  If  PI  obtains  additional  information  on  P2's  location  in  a  par¬ 
ticular  play  of  the  game,  then  Pi's  optimal  strategy  is  likely  to  change 
for  the  next  play.  On  the  other  hand,  if  no  additional  information  is 
obtained,  then  PI  will  have  the  same  optimal  strategy  for  the  ne\t  play. 
Of  course,  these  remarks  also  hold  for  P2's  optimal  strategies.  In  the 
constrained  version  of  the  n-cell  game,  the  players  can  choose  their 
optimal  strategies  adaptively  with  the  optimal  strategies  for  a  particular 
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play  depending  on  the  actual  information  which  is  obtained  in  preceding 
plays.  However,  the  constrained  model  does  not  include  the  evaluation 
of  the  possible  future  consequences  of  a  strategy,  and  thus  the  game  is 
non- sequential  in  nature. 
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2.  6  A  Special  Caa*.;  _  e^-igibio  Radi*;?  of  Detection 

In  .section  2,  2  we  jc-ntior^d  t*'vr  of  the  ex.-"'. lug  ’e^.rv’. 
models  assume  a  negligible  radius  of  detection.  This  assumption  for 
the  n-cell  game  requires  P  to  be  a  diagonal  matrix.  For  then,  PI  can 
detect  P2  with  non-zero  probability  only  if  PI  searches  the  cell  in 
which  P2  is  hiding.  This  special  case  was  first  proposed  and  solved  by 
von  Neumann  [1],  We  give  an  alternate  derivation  which  is  equivalent 
to  von  Neumann's  proof. 

We  assume  that  P  is  a  diagonal  matrix,  i,  e.  ,  p.^  =  0  for  i  ^  j. 

We  also  assume  without  loss  of  generality  that  p.^  >  0  ,  i  *  1 . n. 

For  if  p^  =  0  for  some  i,  then  the  game  has  a  saddle  point  in  pure  stra¬ 
tegies,  and  we  exclude  this  trivial  case.  Under  the  above  two  as¬ 
sumptions,  the  dual  linear  programs  (4)  and  (5)  which  characterize  the 
solution  become: 


Max  v 


Min  u 


v  -  x.p,,S0 
1*11 


u  -  p . .  y .  2  0 
.1,1  J 


(14)  E  xi=  1 


x.  a  0 
i 


(i5) 


r  Yj  =  1 
j 

Yj*  0 


now  we  nna  an  analytic  solution  to  [!*)  ana  [id).  Let 


a  a 


v,  X  =  {xj,  .  .  .  ,  xn)  be  an  optimal  solution  to  (14).  From  the  inequality 


constraints  in  (14),  vmust  satisfy 
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f 


(16) 


*  min  max  min  * 

V  =  :  ..  X.  D..  =  :  X.  p.  . 

1  1*11  1  1^>1 


Notice  that  for  jL  to  be  optimal  we  must  have 


(17) 


p..  x.  =  c  i  =  1,  .  .  .  ,  n 

li  i 


For  if  (17)  does  not  hold,  then  we  can  construct  a  strategy  which  yields 
a  larger  v.  From  (17) 


x.  = -  i  =  1,  ...  ,  n 

1  Pii 


where  c  is  chosen  to  insure  £•  x.  =  1  ,  i.  e.  , 

i=l  1 


-  =  £  - 
c  .  .  p.. 
i=l 


From  (16)  and  (17)  we  have 


v  =  c 

A 

A  similar  analysis  shows  that  X  is  also  an  optimal  strategy  for  player  2. 
Returning  to  the  numerical  example  of  section  2.  4,  we  find  that 

A  A 

if  P  is  a  diagonal  matrix  then  X  and  Y  are  uniform  distributions.  This 
result  clearly  points  out  the  limited  applicability  to  ASW  of  negligible 
radius  of  detection  models. 
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2.  7  Formulation  of  the  Row -Column  Search  Game 

For  the  sake  of  discussion,  we  call  the  next  search  game  of 
interest  the  row-column  search  game.  This  game  is  similar  to  the 
n-cell  game  in  that  (1)  the  searching  region  is  divided  into  cells,  (2) 
the  payoff  is  a  probability  of  detection,  and  (3)  it  is  a  two-person  zero- 
sum  game  with  PI  the  maximizing  player.  The  row-column  game 
differs  from  the  n-ceJl  game  in  the  manner  in  which  player  1  conducts 
the  search,  and  therefore  different  tactical  situations  are  represented. 

Now  the  game  is  formulated.  As  before,  a  pure  strategy  for  P2 
is  a  cell  in  which  to  hide,  but  now  the  cells  are  doubly  indexed  (i,  j) 
i  =  1,  ...  ,m,  j  =  1,  ,,,  ,n.  A  pure  strategy  for  PI  is  the  choice  of  an 
index  i  or  j.  If  the  cells  are  thought  of  as  positions  in  an  mxn  matrix, 
then  P2  chooses  a  position  and  PI  chooses  a  row  or  column.  These 
choices  are  made  simultaneously  and  constitute  a  play  of  the  game. 

The  row-column  game  is  especially  useful  for  studying  certain 
types  of  search  situations.  A  typical  situation  occurs  when  searching  is 
conducted  in  sweeps  and  the  speed  of  the  searching  craft  is  substantially 
faster  than  the  speed  of  the  submarine.  For  then  PI  can  search  an  entire 
row  or  column  while  P2  stays  in  one  cell.  One  example  of  the  row- 
column  game  is  then  search  in  "sweeps'1’  by  aircraft.  On  the  other  hand, 
the  n-cell  model  is  better  suited  to  search  by  slow  craft,  such  as  ships, 
because  each  player  only  chooses  one  _ ell  in  a  play  of  the  game. 
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As  in  the  n-cell  game,  the  payoff  function  will  be  the  probability 
that  PI  detects  P2  (the  probability  of  detection).  However,  for  the 
present  game,  the  probability  of  detection  is  taken  to  be  the  product  of 
the  probability  that  PI  contacts  P2  (contact  probability)  and  the  con¬ 
ditional  probability  that  P.l  identifies  the  contact  as  P2  given  a  contact 
has  been  made  (identification  probability).  We  could  simply  postulate  a 
detection  probability  with  the  understanding  that  both  contact  and  identi¬ 
fication  probabilities  are  included.  However,  we  postulate  the  two 
probabilities  separately  to  exhibit  certain  predominant  features  of  this 
game. 

If  P2  hides  in  cell  (i,  j),  then  PI  may  contact  P2  with  positive 
probability  only  if  PI  searches  row  i  or  column  j.  But,  the  contact 
probability  itself  may  vary  along  a  particular  row  or  column.  This 
variation  may  be  due  to  differences  in  water  temperature,  salinity, 
bottom  conditions,  and  a  host  of  other  factors.  Now  the  identification 
probability  may  depend  on  the  row  or  column  searched  due  to  the  de¬ 
tection  equipment  or  crew  proficiencies  of  the  search  craft  which  are 
available  for  a  particular  row  or  column. 

To  formally  write  down  the  payoffs,  suppose  that  P2  hides  in  cell 

fi.  il  and  PI  searches  row  i.  then  the  contact  nrobability  is  n..  and  the 
....  -  *  ij 

identification  probability  is  a^  and,  therefore,  the  probability  that  PI 
detects  P2  is  a^p„.  Similarily,  if  P2  hides  in  cell  (i,  j)  and  Pi 
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searches  column  j,  the  contact  probability  is  q^  and  the  identification 

probability  is  b..  The  conditional  .probability  of  detection  is  now  b.q  .. 

J  3 

Of  course,  the  above  probabilities  are  defined  for  all  i  and  j. 

From  the  above  definitions  of  the  payoff  elements,  we  construct 
player  l's  payoff  matrix,  Table  2.  3. 


PLAYER  2's  PURE  STRATEGIES 


Player  l's  Payoff  Matrix 
Table  2.  3 
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2.  8  Solution  and  Reduction  to  a  Dvadic  Model 

Now  that  we  have  the  payoff  matrix,  the  game  may  be  readily 

solved.  Let  y..  be  the  probability  that  P2  chooses  cell  (i,  j)  i=  1,  .  .  .  ,  m, 

j  =  1,  ...  ,  n.  Also,  let  u.  be  the  probability  that  PI  chc.oses  row  i 

(i  =  1,  .  .  .  ,m);  and  let  v.  be  the  probability  that  PI  chooses  column  j 

J 

(j  =  1,  ...  ,  n).  Let  Y  be  the  mnx  1  vector  y  =  (y, y. ......  y  )  ; 

11  J-j  mn 

and  let  U  =  (uj»  •  •  •  >  um)  and  V  =  (v^,  .  .  .  ,  v  ).  The  following  dual 
linear  programs*  which  characterize  the  game  solution  are  obtained 
directly  from  the  payoff  matrix 

Max  p  ,  Min  § 


p  -aipijui-bjqijvjso 


5  -  ?  VijVij*0 


(18) 


£  u  +  £  v  =  1 
i  j  J 


m,  Vj  a  0 


(19)  5  -  £  bj  ^ij  Vij  a  0 


m  n 

£  £  yi.  =  1 

i'-l  j=l  1J 


y. .  a  0 
ij 

A  ^  A  A  A 

Let  p,  §,  (U,  V),  Y  be  part  of  an  optimal  solution  to  (18)  and  (19). 

A  A  A 

Then  from  Charnes  [i],  (U,  V)  and  Y  are  optimal  strategies  for  players 
1  and  2  respectively  and  p  ie  the  value. 

To  obtain  additional  insights  into  the  row-column  game,  we 


1 


Due  to  Charnes 


Cl]- 
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tra-.sform  problem  (19)  to  a  dyadic  model.  We  make  the  following  two 
assumptions  and  show  that  they  do  not  result  in  loss  of  generality. 

(i)  Pij  .  qij  >  0  all  1.  j 
(ii)  a.,  b.  >0  all  i,  j 

Now  a^  and  bj  are  probabilities;  therefore,  0  s  aj  s  1,  0  s  bj  s  1.  Since 
the  a^  and  bj  are  non-negative,  the  addition  of  a  large  positive  constant 
to  every  element  of  the  payoff  matrix  yields  a  game  with  all  positive 
p-.  and  q. .  and  the  optimal  strategies  are  not  altered,  von  Neumann  and 
Morgenstern  [1].  Therefore,  assumption  (i)  does  not  result  in  loss  of 
generality.  Suppose  (i)  is  satisfied  and  consider  assumption  (ii).  We 
exclude  the  trivial  case;  all  a  =  b  =0,  and  assume  some  a  or  b  are 

i  j  1  j 

positive.  Every  row  in  the  payoff  matrix  with  a^  =  0  or  bj  =  0  is  domi¬ 
nated  by  a  row  with  positive  a.^  or  b. .  Thus,  the  rows  with  zero  a.  or 

b.  may  be  deleted  from  the  payoff  matrix  and  assumption  (ii)  does  not 
J 

result  in  loss  of  generality.  From  assumptions  (i)  and  (ii)  it  follows 
immediately  that  the  optimal  objective  functions  for  (18)  and  (19)  must 

A 

satisfy  p,  §  >0. 

The  desired  transformation  for  problem  (18)  is 


,  ai  ui 

u|  =  - 

i  o 


bj  vi 

i  =  1 ,  .  . .  ,  m  ;  y!  =  — — -  j  =  1 ,  . .  .  ,  n 


and  for  problem  (i9) 


Vij  .  .  . 

=  i  =  l,  . .  .  ,  m  ;  j  =  l,...,n 
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These  transformations  yield  the  following  dual  pair  of  linear  programs: 


Min  £  —  u| 
i  ai  i 

^  1 
+  £ 

j  bj 

V* 

j 

Max  £ 

i.  j 

y!. 

y 

u!  p. . 

+  v'  q 

•a  1 

£ 

p..y!.  * 

J  U 

j 

y  y 

(22) 

u!  ,  V1 

a  0 

(23) 

L 

q,.y!.  £ 

i  J 

J 

-j  y 

bj 


y1  2  0 

ii 


Since  p,  §  >  0,  equations  (20)  and  (21)  establish  a  one-to-one  corre¬ 
spondence  between  optimal  solutions  to  (18)  and  (22)  and  (19)  and  (23). 
Let  "hats"  on  the  variables  denote  an  optimal  solution  to  (22)  and  (23). 
Then  from  (2D)  and  (21)  we  have 


(24) 


A  A  . 

pu[ 


u.  = 

1  ai 


A  A  . 

pv; 
v  -  J 

j  b. 
J  J 


P  = 


1 


1 


1 


4,—  u.'  +  £~  V* 
ai  1  bi  j 


(25) 


y. .  -  §  y! . 

lJ  'j 


§  = 


2  y!. 


*■>  j 


ij 


Problem  (23)  is  a  dyadic  model  as  defined  by  Charnes  and 
Cooper  [5].  Actually,  (23)  is  not  the  most  general  dyadic  model  but  it 
is  substantially  more  general  than  the  distribution  (transportation) 
model.  Special  computational  techniques  are  available  to  solve  dyadic 
problems  but  we  do  not  dwell  on  them  here.  The  reader  is  referred  io 


Charnes  and  Cooper  [5], 


( 
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2.  9  Special  Cases 

Certain  tactical  situations  may  be  formulated  as  special  cases 

of  (22)  and  (23).  One  of  these  cases  is  when 

(26)  ?..  =  q..  all  i,  j 

y  »J 

Equation  (26)  implies  that  the  contact  probability  for  cell  (i,  j)  is  inde¬ 
pendent  of  whether  PI  searches  row  i  or  column  j,  i.  e.  ,  the  direction 
in  which  the  sweep  is  made  is  immaterial.  Suppose  (26)  is  satisfied, 
and  let 


w. .  —  p. .  y! .  =  q. .  y!. 

y  y  y  y  y 


then  problem  (23)  is  transformed  by  (27)  tc  yield  the  following  dual 


pair  of  linear  programs: 


Min  £  —  u/  +  £  i-v-7 
"  a.  i  b.  J 

J 

Max  X 

i.j 

i 

?y 

wy 

u !  r  v.'  a  — 

(29a)  £ 

wy 

1  •>  Pij 

j 

ai 

(28) 

(29) 

uf,  v! a  0 
i  J 

(29b)  2 

5 

w. . 

y 

An  additional  simplification  of  the  distribution  problem  (29)  can 
be  obtained.  Summing  out  over  the  constraints  (29a)  and  (29b)  re¬ 
spectively,  we  get 


X 

i.  j 


i 


i 
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Since  -  >  0,  the  same  optimal  solution  obtains  if  we  replace  (29a)  or 


ij 


(29b)  by  equalities  as  follows: 


(i)  If  £  —  <  £  r-  replace  (29a)  by  equalities. 

.  ct.  .  D . 

i  i  j  j 


(ii)  If  £  >  £  r~-  replace  (29b)  by  equalities, 

i  ai  j  J 


(iii)  If  £  —  =  £  ~  replace  both  (29a)  and  (29b)  by  equalities. 

;  a.  :  b. 

1  i  J  J 

Under  the  indicated  assumption  (26),  the  row-column  search  game  has 
been  reduced  to  a  distribution  model.  We  give  a  tactical  example  of 
this  model  in  the  next  section. 

A  final  simplification  of  the  most  general  game  obtains  if,  in 

addition  to  (26),  we  assume  that  a.  =  b.  =  1  and  m  =  n.  Of  course, 

i  J 

these  assumptions  mean  that  the  identification  probabilities  are  one  and 
the  search  region  is  divided  into  an  equal  number  of  rows  and  columns. 
Now  (28)  and  (29)  are  reduced  to  the  following  dual  linear  programs. 


Min  £  u/  +  £  Vj 


(30) 


u !  +  v.7  a  -i- 


/  / 

U.  ,  V.  2 

1  J 


Max  £  -L.  w. . 

i,  j  Pij  U 


(31) 


£  w.  d  1 

j  1J 


£  wyil 

l 

w„iO 

ij 
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From  statement  (iii)  above,  the  constraints  in  (31)  are  satisfied  as 
equalities  by  an  optimal  solution.  But,  the  equality  form  of  (31)  is  the 
well-known  assignment  problem.  This  equivalent  assignment  problem 
was  first  obtained  by  von  Neumann  [1],  Our  proof  of  equivalence  is 
considerably  more  direct  than  his,  due  to  the  linear  programming 
characterization  of  a  matrix  game  which  is  now  available.  This  com¬ 
pletes,  the  transformations  of  the  row-column  game  to  dyadic-type 
models. 


2.  10  Tactical  Example 


We  conclude  this  chapter  with  an  example  of  the  row-column 
search  game.  In  this  example,  the  search  will  be  conducted  by  a  single 
aircraft.  The  search  region  of  interest  is  partitioned  into  four  rows 
and  six  columns,  and  the  aircraft  searches  in  row  or  column  sweeps. 

We  assume  that  when  P2  hides  in  cell  (i,  j)  the  contact  probabilities 
depend  on  (i,  j)  but  not  on  whether  PI  searches  row  i  or  column  j.  Thus, 


we  have  p„  =  q„  all  i,  j  ,  and  this  game  is  equivalent  to  the  distribution 
problem  (29).  The  contact  probabilities  and  the  identification  proba¬ 


bilities  are  given  in  Table  2.  4. 


To  apply  standard  methods  to  solve  (29),  we  convert  it  to  the 
standard  equality  form.  For  this  example  £  —  >  £  r~  *  thus  the 

.  3 .  .  D . 

i  i  j  j 

inequalities  (29b)  are  automatically  satisfied,  as  equalities  by  the  optimal 


solution  to  (29).  Therefore,  we  simply  adjoin  a  dummy  column  to  (29) 


to  obtain  the  equivalent  standard  distribution  problem.  This  standard 


form  and  the  optimal  solutions  are  given  in  Table  2.  5.  Finally,  the 
optimal  solutions  are  converted  to  strategies  and  tabulated  in  Table  2.  6. 
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Optimal  Tableaus 
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The  transformation  formulas  to  obtain  optimal  strategies 
from  the  optimal  solutions  of  Table  2.  4  are: 


i  *  S  -7-  wif  =  ?-7.  68 

5  i,j  pij  J 


/V  /  A 


Optimal  Strategies  for  P2 


(i.  j) 

A 

A  / 

yij 

(1,4) 

0.  060 

0.  060 

(1,  5) 

'  o.o6(T~ 

0. 060 

(2,1) 

0.  181 

0.  181 

(2,2) 

■  0^361 

0.361 

{3.  1) 

0.073 

0.  073 

(3,6) 

0.  120 

0. 060 

(4,  3) 

0.  145 

0.  145 

tcst 

0.060 

all  other  y. .  =  0,  all  other  yf.  =  0 

y  y 


1 


Optimal  Strategy  for  PI 

u2  =  0.  092 
vj  =  0.  217 
v2  =  0.  306 
s  C«  146 
v4  *  0.  060 
v5  =  0.  060 
v6  =  0.  121 

ail  other  u.  =  v.  =  0 


Optimal  Strategies 
Table  2.  6 
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CHAPTER  III  -  SEQUENTIAL  N-CELL  GAME 


3,  1  Results 

In  this  chapter,  we  formulate  and  solve  a  sequential  search  game. 
This  game  consists  of  a  sequence  of  moves,  and  on  each  move  the 
players  are  faced  with  an  n-cell  search  game.  We  consider,  in  turn, 
both  a  finite  and  infinite  number  of  moves.  In  essence,  on  each  move 
the  players  simultaneously  choose  a  strategy  in  an  n-cell  search  game 
and  thereby  determine  an  immediate  payoff  and,  in  addition,  a  proba¬ 
bility  that  the  n-ceil  game  is  played  again.  Hence,  a  sequence  of  n-cell 
games  is  played.  We  show  how  to  find  optimal  strategies  for  both  the 
finite  and  infinite  games  which  minimax  the  expected  accumulated 
payments. 

For  the  finite  game,  we  show  how  to  characterize  the  value  and 
optimal  strategies  in  a  recursive  manner.  In  this  way,  we  can  compute 
the  solution  by  linear  programming  methods. 

A  characterization  of  the  solution  of  the  infinite  game  results  in  a 
non-linear  programming  problem.  However,  if  one  variable  is  treated 
as  a  parameter,  the  resulting  problem  is  a  linear  program.  We  show 


L 1*# 
iiv  n  vv 


pcirainctcr  in  successive 


linear 


program  and  thereby  approximate  a  solution  to  the  non-linear  problem. 
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We  discuss  in  detail  two  particular  payoffs  which  are  meaningful 
for  ASW  purposes.  One  of  these  payoffs  reduces  our  characterization  of 
the  infinite  game  to  a  linear  programming  problem.  Finally,  examples 
are  given,  and  we  compare  the  sequential  n-cell  game  to  the  non¬ 
sequential  n-cell  game. 


3.  2  Formulation  of  the  Finite  Game 

First,  we  discuss  the  elements  of  the  finite  game,  and  then  we 
proceed  with  the  mathematical  formulation.  A  play  of  the  game  consists 
of,  at  most,  a  finite  number  (N)  of  moves.  On  each  move,  when  the 
game  has  not  terminated,  the  players  are  faced  with  a  two-person  zero- 
sum  game.  In  our  formulation,  we  shall  use  the  n-cell  game  as  the  two- 
person  zero-sum  game  for  each  move.  When  the  players  move,  they 
each  choose  a  strategy  which  determines  a  zero-sum  payoff  from  player 
2  to  player  1  and  a  probability  that  the  game  terminates  before  the  next 
move.  We  wish  to  find  an  optimal  strategy  for  each  player  which  mini¬ 
maxes  the  expected  accumulated  payments  received  by  player  1, 

For  ASW  purposes,  we  consider  two  particular  payoffs.  As  in 
the  n-cell  game,  the  payoff  for  each  move  may  be  the  probability  that 
PI  detects  P2  during  the  move.  Then,  as  we  shall  see,  the  expected 
accumulated  payment  received  by  PI  is  the  probability  that  PI  detects 
P2.  The  other  payoff  considered  is  the  time  taken  by  one  move.  Here, 
PI  receives  a  payoff  of  one  time  unit  regardless  of  the  strategies  chosen, 
and  the  expected  accumulated  payment  is  the  expected  duration  of  the 
game.  In  the  following  formulation,  we  use  the  generic  term  payoff  to 
accommodate  both  of  the  above  tactical  payoffs  and  others  as  well. 

The  recursive  optimization  technique  which  we  will  propose 
has  also  been  discussed  by  other  authors.  Kuhn  [1]  (1 953)  gave  his 
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theorem  on  games  of  perfect  recall  which  paved  the  way  for  further 
work.  Shapley  [1]  (1953)  was  the  first  to  use  a  recursive  optimization 
technique  for  this  type  of  game,  although  he  did  not  deal  with  :he  finite 
case.  Later  contributions  were  made  by  Bellman  [1]  (1957), 

Zachrisson  [  1]  (1964),  and  Denardo  [1]  (1965).  However,  the  develop¬ 
ment  given  here  differs  in  content  and  detail  from  tl  a  above  cited  works. 

We  assume  that  a  search  region  is  given  and  that  the  region  is 
divided  into  n  cells  numbered  i  =  1,  ...  ,  n.  We  also  assume  that  a  pure 
strategy  for  each  player  on  each  move  corresponds  to  the  choice  of  a 
cell.  From  a  tactical  standpoint,  a  pure  strategy  for  PI  (the  searcher) 
is  a  cell  to  search  and  a  pure  strategy  for  P2  (the  hider)  is  a  cell  in 
which  to  hide.  Notice  that, .we  have  assumed  that  the  same  set  of  pure 
strategies  is  available  for  each  player  on  each  move.  We  have  taken 
this  assumption  for  notational  convenience;  it  could  be  relaxed.  We 
further  assume  that  a  play  of  the  game  consists  of,  at  most,  N  moves; 
and  we  number  the  possible  moves  r  =  1,  2,  .  . .  ,  N.  On  each  move,  the 
players  choose  their  strategies  simultaneously  and  the  moves  are  made 
sequentially.  Unless  otherwise  stated,  we  assume  that  PI  is  the  maxi¬ 
mizing  player. 

The  payoffs  and  continuation  probabilities  are  new  specified. 

«  t 

Suppose  that  PI  searches  cell  i  and  P2  hides  in  cell  j  on  move  r.  Then 
the  yeyoff  from  P2  to  PI  is 
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aij(r)  i,  3  =  1 . n 

r  =  1,  . . .  ,  N 

Also,  when  PI  searches  cell  i  and  P2  hides  in  cell  j  on  move  r,  the 
game  continues  until  move  r  +  1  with  probability. 

p.Jr)  i,  j  =  1,  .  . .  ,n 

r  =  1,  . . .  ,  N  -  1 

We  let  A  be  the  nxn  matrix  A  =  (a..(r))  and  P  the  nxn  matrix 
r  r  '  "  r 

Pr  =  (p-j(r)).  Hence,  Ar  is  Pi's  payoff  matrix  for  move  r  and  P^  if 
the  matrix  of  continuation  probabilities  for  move  r. 

Next,  we  consider  strategies  for  the  players.  We  have  assumed 
that  the  continuation  probability  and  payoff  depend  only  on  the  choices 
available  for  a  particular  move.  It  follows  that  the  game  is  one  of  per¬ 
fect  recall  as  defined  by  Kuhn  [1].  Kuhn's  theorem  for  a  game  of  per¬ 
fect  recall  asserts  that  an  optimal  strategy  for  a  particular  move  does 
not  depend  on  preceding -strategies.  Hence,  an  optimal  strategy  for  each 
particular  move  in  our  game  is  a  mixed  strategy  over  the  alternatives 
available  at  that  move.  Kuhn  calls  this  type  of  strategy  a  ‘'behavior 
strategy".  We  restrict  our  attention  to  these  mixed  strategies  without 
loss  of  generality. 

Let  Xr  and  Yp  be  mixed  strategies  over  the  alternatives  avail¬ 
able  on  move  r  for  PI  and  P2  respectively.  Let  X  =  (X^,  . .  .  ,  X^)  be 
an  N-tuple  of  mixed  strategies  for  PI,  with  X^  being  the  mixed  strategy 
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for  move  r.  Similarity,  let  Y  =  (Yj,  .  . .  ,  Yjvj)  be  P2's  game  strategy. 
Now  we  define  the  following  sets  of  strategies 

Xr  =  (xr}.  *r  =  {Yr},  *  =  £Y) 

From  the  above  discussion,  XandX  contain  optimal  game  strategies 
for  PI  and  P2  respectively.  These  optimal  game  strategies  are  optimal 
with  respect  to  the  set  of  all  possible  strategies. 

We  will  write  the  total  expected  payoff  for  PI  in  terms  of  fixed 
strategies  X«2,  Y  «X  and  the  given  information.  If  PI  chooses  the 
strategy  Xy  for  move  r  and  P2  chooses  Yy  ,  then  the  payoff  to  PI  for 
move  r  is 

X*  A  Y  r  =  1 . N 

r  r  r 

and  the  game  continues  until  move  r  +  1  with  probability 

Xt  P  Y  r  =  1,  ...  ,  N  -  1 
r  r  r 

Now  the  expected  payoff  to  PI  for  move  r  is  the  product  of  the  proba¬ 
bility  that  the  game  continues  until  move  r  and  the  payoff  fo.v  move  r 

r-l 

Xfc  A  Y  n  x£  P,  Y,  r*2,  3,  ...  ,N 
r  r  r  h=l  11  h  h 

The  expected  accumulated  payoff  for  N  moves,  v^  (X,  Y)  ,  is  the  sum  of 
the  above  expected  payoffs  for  each  move. 


Vj  {X,  Y)  =  xjAj  Yl 


+  E  x‘ArY  'ri1X=PhYh 

r=2  h=l 


(1) 
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Since  the  game  has  a  finite  number  of  moves  and  a  finite  number  of 
strategies,  it  must  have  a  value  and  optimal  strategies,  von  Neumann 
and  Morgenstern  [  1]  .  Recall  that  the  sets  X  and  ~Z  contain  optimal 
game  strategies.  Therefore,  the  function  v^  (X,  Y)  has  at  least  one 
saddle  point  over  the  sets  X  and  3T.  We  propose  a  recursive  optimiza 
tion  technique  to  find  the  saddle  points  of  v^  (X,  Y). 
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3.  3  Recursive  Solution 

In  this  section,  we  show  how  to  compute  the  minimax  of  equation 
(1)  by  a  recursive  technique.  Let  Xy  and  denote  the  sequences  of 
strategies 


Xr=<V  Xr+1 . 


r  =  1,  .  .  .  ,  N 


Y  =  (Y  ,  Y  ,  ...  ,  Y  ) 
r  r  r  +  1  N 


Of  course,  we  have  X^  =  X,  and  Y^  =  Y  .  We  rewrite  equation  (1)  and 
define  the  functions  v  (X  ,  Y  )  by 


(2)  v  {XfY)  =  XtA  Y  +  (Xt  P  Y  )  v  (X  ,  Y  )  r  a  1. 
r  r  r  rrr  rrr  r+lr+i  r+1 


N 


VN  -i  1  “  0 


Nov/.  v'r(Xr,  Y,.}  may  be  interpreted  as  the  expected  accumulated  pay¬ 
ments  received  by  Pi  on  the  last  N  -  r  +  1  moves  of  the  game. 

1%  is  intuitively  clear  from  equation  (2)  that  the  value  and  optimal 
strategies  may  be  computed  recursively.  We  shall  establish  this  fact. 
We  define  v^ ,  Xr>  Yr  by  the  following  equations 


(3)  v  = 


Max  Min 

Xr.Xr  Yr«Tr 


fx‘A  Y  KX*P  Y  |5  ,1 

L  r  r  r  rrr  r+1  J. 


r  =  1, 


,N 


N+UO 


A+  A  A*  A  A 

=  X1  A  Y  +  (Xz  P  Y  )  v  . 
rrr  '  r  r  r'  r+1 

The  minimax  theorem  of  von  Neumann  [2]  establishes  the  existence  of 
Xr#  Yy,  vy  for  equation  (3).  The  following  theorem  relates  the  solutions 


( 

► 


it 
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of  equation  (3)  to  the  solutions  of  the  sequential  game. 

Theorem  1:  vj  is  the  value  of  the  sequential  game,  and 

A  A  A  A  A  A 

X  =  [Xv  .  .  .  ,  XN)  ,  Y  =  (Y^,  ....  Y^)  are  optimal  strategies  for  PI  and 

P2  respectively. 

Proof  -  Since  v^  (X..  Y)  is  the  expected  payoff  function  for  the 
sequential  game,  a  necessary  and  sufficient  condition  for  v^  to  be  the 

A  A 

value  of  the  game  and  X,  Y  optimal  strategies  is 

vx  (X,  Y)  £  Vj  £  Vl  (X,  Y)  all  XeXand  YcX 


A  A 


We  shall  show  that  this  condition  is  satisfied  by  v^ ,  X,  Y  as  defined  by 
(3).  From  (3)  we  have 

<4>  ^AYr  +  (x‘prft)$r+1*vr*j?rAYr  +  (x‘prYr)vr+l  . 

ail  X  ,  Y  «X 
r  r'  r  r 

We  begin  an  inductive  argument 

VN(XN'  YN*  =  XNAN  *N  *  v  a11  YN 


assume 


,c 


vr+l  {Xr+l  ’  Yr+1>  *  ^r+l  for  some  r 


By  definition 


•  yr(-V  V  =  Xr\  *r  +  <*rP,  Yr)  vr+1  (Xr+1  .  Yr+1)  all  Y, 
By  the  inductive  assumption  and  P?  Yr  a  0 

vr  ( Xr  ,  Yr )  a  X'r  ^  Yr  +  ( Pr  Yr)  «r+ ,  all  v. 


1 


79 


From  the  above  equation  and  equation  (4) 

A 


all  Y 

r 


Hence,  by  induction  on  r 

v  Sv  (X,  Y)  all  YeX 

i  1 

Similarly,  vve  may  establish 

a  Vj  (X,  Y)  all  \iX 

therefore 

v j  (X,  Y)  v  £  v,  (X,  Y)  all  X«X  Y«X 


and  the  theorem  is  true. 

We  have  established  that  the  value  and  optimal  strategies  may  be  com- 
puled  by  means  bf  equation  (3). 

For  each  fixed  r  ir.  (3)  we  must  solve  an  ordinary  matrix  game. 

The  game  has  the  payoff  matrix  A  +  v  P  with  v  known.  As  in 
6  r  7  r  r+1  r  r+1 

Chapter  2,  we  draw  on  the  .following  linear  programming  formulation  of 
this  game  due  to  Charnes  [  1]. 


L.P.  (vr+i)  r  =  1,  .  . 


Max  v 

r 

r» 

v  -  £  x.  [a..{r)  +  v  p.,(r)]  £  0  j  =  l,...,n 
r  ir  ij  r+1  ij 


n 

2  x.  =  1 
i=l  xr 


xir^0  1=1,  ,..,n 
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From  Charnes  [1],  an  optimal  solution  to  L.  P.  (vr+^)  solves  equation 
(3),  i.  e. ,  it  yields  an  optimal  strategy  Xr  for  PI  on  move  r.  Of  course, 
an  optimal  solution  to  the  dual  of  L.  P.  (vr+j)  also  yields  an  optimal 

a 

strategy  Yr  for  P2,  and  this  strategy  is  available  when  the  primal  is 
solved. 

The  value  and  optimal  strategies  may  be  computed  by  the  follow¬ 
ing  method. 

1.  Set  v  =0. 

N+l 

2.  Given  vr+j,  solve  L.  P.  (vr+j)  for  an  optimal  solution  vr> 

A  A 

X^,  and  also  obtain  a  dual  optimal  strategy  . 

3.  Return  to  step  2  until  v^  is  computed, 

A  A  A 

As  we  have  shown,  v^  is  the  value  of  the  game  and  X  =  (Xj,  ...  ,  X^j)  , 

A  A  A 

Y  =  (Y^,  ...»  Y^)  are  optimal  game  strategies  for  PI  and  P2  respec¬ 
tively.  In  the  next  section,  we  give  a  simple  example  of  the  above 


method. 
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3.  4  The  Negligible  Radius  of  Detection  Assumption  and  an  Example 
We  briefly  examine  the  negligible  radius  of  detection  assumption 
for  the  sequential  n-cell  game.  In  section  2.  6  we  developed  this  as¬ 
sumption  for  the  n-cell  game  and  found  the  optimal  strategies  and  value. 
Of  course,  some  of  the  results  obtained  in  section  2.  6  will  carry  over 
directly  tc  the  present  discussion.  Tc  take  the  negligible  radius  of  de¬ 
tection  assumption,  we  assume  that  A  and  P  are  diagonal  rnatriceg 

r  r 

{r  =  1,  .  .  .  ,  N).  We  denote  the  diagonal  elements  of  A  and  P  by  a. 

-  v  ir 

anc*  ^ir  =  1>  •  *  •  >  n)  respectively.  Under  this  assumption,  if  PI  and 
P2  both  choose  cell  i  on  move  r  then  the  payoff  to  PI  is  a.  and  the  came 
continues  until  move  r  r  1  with  probability  p^;  otherwise,  the  payoff  and 
continuation  probability  are  zero.  In  terms  of  tactics,  this  model  could 
be  used  in  situations  where  PI  already  has  a  contact  with  P2  and  PI  may 
have  a  positive  probability  of  maintaining  the  contact  with  PZ  (the  game 
may  continue)  only  if  PI  looks  in  the  cell  where  P2  is  hiding.  Otherwise, 
P2  evades  PI. 

In  the  present  special  case,  equation  (3)  is  rendered 


Max  Min  r  2.  /*  S  \  „  r-T  * '  ‘ '  •  ’  ^ 

y  v  I  £  x.  a.  y.  i  {  £  x.  p.  y.  )  v 

*‘z  L  t=  1  ir  ir  ir  \  j=l  ir  ir  irj  r-fl  J 


v  so 
N  +  1 


fiArtinr]  ?  M  — .  -u. —  - 


J  Artl  Ar  y  \  X  a-  *-+s  - -  -»V.  -  ..a - a! - •  . 

WUAV»V41/14  uw*  auo  v  C  cwucvxOu  Wi,ut:  x*  —  I'*  IS 


lAs  in  sdc.'ion  2.  6-  we  assume  without  loss  of  generality  that 


air  >  0  ,  all  i,  r 
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~  a  CN  .  , 

xiN  =  yiN  =  &  1  =  l*  ‘ 

iN 

and,  in  general,  the  solution  is 


'  n  :  VN  =  C,N  :  c 


N  i=l  aiN 


r  .  .  a  1  » 

ir  'ir  a.+TIlp.""  1=1"*"n'  vr  =  V  “  =  X 

ir  r+1  *ir  '-r 


($\  v.  =  v.  =  — 

,  >•  /i-r  3 


,~X- 

a.  +  v 


«*-  T  V  .  . 

i=l  ir  r+1  *ir 


Tth 


For  the  N  move,  .the  optimal  strategies  are  identical  to  those  of  an 
n-cell  game  with  diagonal  payoff  matrix  A^.  This  is  to  say,  P2 
chooses  a  hiding  cell  with  probability  which  is  inversely  proportional  to 
the  payoff  for  that  cell.  For  moves  other  than  the  N  ,  an  optimal 
strategy  depends  on  the  current  payoff  and  the  continuation  probability 
as  shown  in  equation  (5).  This  seeme  to  be  a  "reasonable"  optimal 
strategy  for  P2.  Chi  move  N,  the  theory  tells  PI  to  look  with  the  highest 
probability  in  the  cell  with  the  smallest  probability  of  detection  because, 
in  a  sense,  P2  ia  likely  to  hide  in  the  ceil  with  the  lowest  probability  of 
detection, 


We  close  the  discussion  of  the  finite  sequential  game  with  a 
simple  example.  Consider  a  game  with  twe  cells  and  the  same  A  and 
P  matrix  for  every  move.  This  3a me  will  have,  at  most,  three  moves. 
Later,  we  compare  this  game  with  one  where  an  infinite  number  of 
moves  is  allowed.  The  given  information  for  the  game  is 


f  .  1  .  2  r>S.7*j 

A  =  L  .  3  .  1  J  p  -  L  .  6  .  7  j 


N  =  3 


We  solve  this  game  by  a  simple  graphical  method.  Starting  with 


3.  5  Formulation  of  the  Infinite  Sequential  Game 

In  this  section,  we  allow  an  infinite  number  of  moves  in  the  se¬ 
quential  game.  Before  giving  an  analytic  formulation,  we  discuss  some 
of  the  features  of  the  game.  In  the  infinite  sequential  game  there  is  no 
maximum  number  of  moves.  The  continuation  probabilities  alone  con¬ 
trol  the  termination  of  the  game.  To  obtain  a  manageable  analytic 
problem,  we  must  assume  that  one  payoff  matrix  and  one  continuation 
probability  matrix  are  specified  for  all  moves.  We  further  assume  that 
the  probability  of  continuing  until  the  next  move  is  strictly  less  than  one 
for  all  pairs  of  strategies.  This  assumption  guarantees  boundedness  of 
ihe  expected  accumulated  payments  received  by  PI;  and  it  guarantees 
that  the  game  terminates  with  probability  one,  although  the  number  of 
moves  may  not  be  bounded.  We  will  discuss  these  assumptions  in  more 
detail,  when  we  consider  a  more  complicated  version  of  this  game,  in 
the  next  chapter.  Now  we  turn  to  a  formal  definition  of  the  game  under 
consideration. 

We  assume  that  a  search  region  is  specified  and  that  it  is  divided 
into  n  cells.  If  PI  chooses  cell  i  (i  =  1,  ...  ,  n)  and  P2  chooses  cell  j 
(j  -  1,  ...» n)  on  move  r  (r  =  1,  2,  .  . .  )  then  PI  receives  from  P2  the 
payoff 

a. . 

and  the  game  continues  until  move  r  -t  1  with  probability 
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(6)  0  i  p..<  1 

ij 

Let  P  be  the  n xn  matrix  P  =  (p. .)  and  A  the  n  x  n  matrix  A  =  (a. A  is 

ij 

the  payoff  matrix,  and  P  is  the  matrix  of  continuation  probabilities  for 
every  move.  We  further  assume  that  the  game  is  zero  sum  and  that 
PI  is  the  maximizing  player. 

The  game  which  we  have  defined  above  is  one  of  "perfect  recall"; 

and  by  Kuhn's  [1]  theorem,  a  "behavior  strategy"  is  optimal.  Briefly, 

a  behavior  strategy  is  defined  with  reference  to  the  inlormatio  i  sets 

in  the  game.  If  a  player  uses  a  behavior  strategy,  he  plays  the  same 

mixed  strategy  over  the  alternatives  in  an  information  set  each  time  the 

information  set  is  reached,  regardless  of  the  past  history  of  the  game. 

In  the  infinite  sequential  game,  there  is  only  oi.a  information  set  and, 

therefore,  a  behavior  strategy  is  simply  a  mixed  strategy  which  is  used 

for  every  move.  We  restrict  our  attention  to  these  strategies. 

Let  X  =  (x  ,  x.  ...  ,x  )  and  Y  =  (y, ,  .  .  .  ,  y  )  be  behavior 
1  2  n  In 

strategies  (mixed  strategies  over  the  alternatives)  for  PI  and  P2  re¬ 
spectively.  For  example,  PI  chooses  alternative  i  with  probability  x^ 
on  every  move.  The  expected  accumulated  payment  received  by  PI, 
v(X,  Y),  is  simply  the  sum  over  all  r  of  the  probability  that  the  game 
lails  until  moVe  r  times  the  payment  to  PI  for  move  r, 

00 

v(X,  Y)  =  2  (X^Y)1  X^AY 
r=o 


(7) 
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The  above  sum  converges,  since  (6)  implies  0  s  X*PY  <  1  for  all 
strategies  X,  Y.  For  convenience,  we  define  the  matrix  Q  =  (q^) 
with  q. .  =  1  -  p. .  all  i,  j.  Then  Q  is  the  matrix  of  positive  termination 
probabilities.  Equation  (7)  may  be  written  as 


(8) 


v(X,  Y) 


X‘~AY  _  XtAY 
1  -  Y  XtQY 


von  Neumann  [3]  first  established  the  existence  of  a  unique  value  v  and 

A  A 

optima*  strategies  X  and  Y  for  the  form  in  (8),  i.  e. ,  there  exists  a 

A  A 

unique  real  number  v  and  strategies  X,  Y  such  that 
X*AY  x*AY 

(9)  -  £  v  £  — - —  all  strategies  X,  Y 

XlQY  yJ-QY 

An  elementary  proof  of  this  fact  was  subsequently  given  by  Loomis  [1], 
and  this  result  is  a  special  case  of  Shipley' s  [  1]  more  general 
"stochastic  game".  Neuts  [1]  formulated  and  solved  a  special  case  of 
the  infinite  sequential  game.  His  P  matrix  was  a  diagonal  matrix  and 
his  A  matrix  also  had  a  special  form. 


( 
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3.  6  Solution  by  Linear  Programming  Methods 

In  the  last  section,  we  formulated  the  game  of  interest  and 
noted  the  existence  of  a  solution  (a  value  and  optimal  strategies). 
However,  there  are  no  known  methods  for  computing  a  solution.  In 
this  section,  we  develop  a  computational  method  to  approximate  a 
solution.  The  method  is  based  on  a  linear  programming  formulation 
of  the  game  with  an  unknown  parameter  in  the  constraints.  We  show 
that  this  parameter  is  equal  to  the  value  of  the  game  if,  and  only  if, 
the  optimal  objective  function  of  the  linear  program  is  zero.  The  re¬ 
mainder  of  our  discussion  is  then  devoted  to  a  method  lor  approximating 
the  required  value  of  the  parameter. 

To  begin,  we  establish  Lemma  1  which  relates  the  solution  of 
the  infinite  sequential  game  to  the  solution  of  an  ordinary  two -person 
zero-sum  game. 

Lemma  1  A  necessary  and  sufficient  condition  for  v  to  be  the 

A  A 

value  of  the  infinite  sequential  game  and  X,  Y  optimal  strategies  is 
that  the  two-person  zero-sum  game  with  payoff  matrix  A  -  vQ  has 


value  zero  and  optimal  strategies  X,  Y. 

Proof  For  the  game  A  -  vQ  to  have  value  zero  and  optimal 


strategics  X,  Y,  it  is  nsc"sss.ry 


auu  ouXXltiCUl  LXlCLL 


(10)  X*(A  -  vQ)  Y  5  0  S  Xl(A  -  vQ)  Y  all  strategies  (X,  Y 

But,  X^'QY  >  0  for  all  strategies,  X,  Y.  Hence,  v,  X,  Y  satisfy  (10) 
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l 


if,  and  only  if, 

yt  A  Y  YtiV 

(10a)  £  v  £  all  strategies  X,  Y. 

XtQY  X*QY 

Equation  (JOa)  is  a  necessary  and  sufficient  condition  for  v  to  be  the 

A  A 

value  of  the  infinite  sequential  game  and  X,  Y  optimal  strategies.  Hence, 
the  lemma  is  true. 

Lemma  1  immediately  suggests  a  method  for  computing  v.  The 
main  idea  is  to  choose  a  number  s  and  compute  the  value  of  the  game 
A  ••  sQ.  If  the  value  of  A  -  sQ  is  zero,  then  s  =  v  and  we  are  finished. 

I 

* 

If  the  value  of  A  -  sQ  is  not  zero,  then  we  want  to  choose  a  new  value 
of  s,  say  s^  ,  such  that  the  value  of  A  -  s^Q  is  "closer"  to  zero  than  the 
value  of  A  -  sQ.  We  begin  by  formulating  the  game  A  -  sQ'as  a  linear 
program. 

Consider  the  linear  program 
Max  u 

s 

(11)  u  et  -  X*  (A  -  sQ)  s:  0 

8  S 

Xt  e  =  1 
s 

X  2  0 
s 

where  e  is  the  nxl  vector  of  all  "ones",  X  is  an  nxl  vector,  and  s 

s 

is  a  fixed  scalar.  Let  u  ,  X  be  an  optimal  solution  to  (II).  Then 

s 

A  A 

from  Charnes  [1],  X  is  an  optimal  strategy  for  PI  and  u  is  the  value 

S  8 

of  the  game  A  -  sp,  (s  fixed).  Of  course,  an  optimal  strategy  Yg  for 


P2  is  part  of  an  optimal  solution  to  the  dual  of  (11),  and  Ys  is  available 

when  (11)  is  solved  by  the  simplex  method. 

Next,  we  examine  the  variation  in  u  which  results  from  a 

s 

change  in  s.  We  consider  a  perturbation  from  s  to  s  +  §  in  problem 

(11),  and  we  want  to  relate  u  to  u  .  Accordingly,  we  add  and 

s  s+§ 

subtract  the  vector  fX^Q  from  the  constraints  of  (11)  and  obtain  the 

s 

following  equivalent  linear  program 
Max  ug 

u  et  -  Xt  (A  -  (s  +  |)  Q)  -  CX4  Q  i  0 
s  s  s 


(12) 


Xfc  e 


=  1 


X  a  0 
s 

* 

We  seek  to  obtain  a  linear  programming  formulation  of  the  game 
A  -  (s  +  §)  Q  from  (12).  Hence,  we  let  q  =  ^  q..,  q  =  "j*1?  q.. 
and  then  for  §  >  0 

(13)  §5et  s  5X*  Q  £  §qe*  all  strategies  X 

S  8 

Now  consider  the  following  linear  program 
Max  u1 

u'e*  -  X*  (A  -  (s  +  §)Q)  £  gqe1 

Y“  o  i  ’ 


(14) 


X  2s  0 


Problem  (14)  is  "less  constrained"  than  (12).  Therefore,  the 
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respective  optimal  rolutions  must  satisfy  ("hats"  on  the  variables  de¬ 
note  optimal  values) 

(lb)  U  ^  Ug 

Notice  that  the  right-hand  side  of  the  constraints  in  (14)  is  a 
constant  vector.  We  bring  this  vector  over  to  the  left-hand  side  of 
the  constraints  and  make  the  change  of  variable 
(16)  usi5=u'-5? 

to  obtain  the  program 


(17) 


Max  (us+5  +  Sq) 

es+?  e*  -  XMA  -  (s  +  §)  Q)  s  0 
Xfc  e  =  1 


X  £0 

But,  (17)  is  the  desired  linear  programming  formulation  of  the  game 
A  -  (s  +  §)Q  except  for  the  additive  constant  +§q  in  the  objective 


Thus  for  §  >  0 
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(18)  ug  -§f^Ug+?SUs  -|q,  g>0 

and  for  §  <  0  we  can  derive  the  relationship 

(19)  ug  -  |  q  ^  ug+?  s:  Gg  -  §q,  §<  0 

Equations  (18)  and  (19)  give  the  desired  relationships.  We  can  choose 

a  starting  value  of  s  and  then  subsequently  perturb  us  towards  zero. 

Before  giving  a  tactical  example,  we  determine  two  numbers 

m  and  M  (m  £  M)  such  that  u  s  0  and  u,,S0.  Then  since  u  is  a 

m  M  s 

continuous  function  of  s,^  ug  =  0  for  some  s  in  the  range  m  £  s  £  M. 
Suppose  we  choose 


(20) 


then 


_ _ min  “ij 

m  =  — - 


j  q 


,  m 


max  ij 


iJ 


i.  J  q 


y 


mq..£a..,  Mq. .  2  a.,  alii,  j 

ij’  Hij  ij 


From  the  constraints  of  (11),  we  see  that 


a  mm  »  „ 

u  =  .  E  x.  a..  -  s  q..) 

8  3  i=l  1  lJ  1J 


thus 


V*0’  UMS0 


With  certain  restrictions  on  the  elements  a„,  we  can  derive  tighter 
bounds  than  m  and  M;  but  for  our  purposes,  the  bounds  given  here  are 


1 


This  fact  is  clear  from  the  foregoing  derivation. 


adequate, 
the  range 


Since  the  value  of  the  game  v  satisfies  =  0, 
m  £  v  £  M  and  we  restrict  our  attention  to  this 


v  must  be  in 
range. 
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3.  7  Tactical  Payoffs  and  an  Example 

Two  reasonable  objectives  for  PI  in  tactical  situations  are; 

1.  Minimize  the  time  to  detect  P2. 

2.  Maximize  the  probability  of  detecting  P2. 

Of  course,  P2  maximizes  when  PI  minimizes  and  vice  versa. 

The  first  objective  is  discussed  in  the  next  section  and  the  resulting 
game  can  be  formulated  as  a  linear  program.  No  iterative  solution 
technique  is  required.  To  obtain  objective  2  above,  we  interpret  the 
a„  as  the  probability  that  PI  detects  P2  in  one  move.  We  also  allow 
the  game  to  terminate  by  several  methods.  For  example,  the  game 
terminates  if  PI  detects  P2  or  P2  sinks  PI  or  P2  escapes  from  the 
search  region.  Hence,  the  probability  that  the  game  continues  until  the 
next  move  is  no  larger  than  one  minus  the  probability  that  PI  detects 
P2,  i.  e.  ,  p..  S  1  •  a.,  all  (i,  j).  With  this  condition,  the  value  v  will 
satisfy  0  S  vS  1,  and  v  is  in  fact  the  probability  that  PI  eventually  de¬ 
tects  P2.  Also,  notice  if  we  require  that  PI  either  detects  P2  or  the 
game  continues,  then  p..  =  1  -  a.,  (all  i,  j).  From  equation  (8)  v  =  1  , 
i.  e.  ,  PI  eventually  detects  P2  with  probability  one  and  all  strategies 
are  optimal. 

To  illustrate  the  method  developed  in  the  last  r action,  wc  present 
the  following  example.  We  assume  that  the  search  region  consists  of 
two  cells  and  that  PI  wants  to  maximise  the  probability  of  detecting  P2. 
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The  following  payoffs  A  and  stop  probabilities  Q  are  given. 


f.  1  .2-1  r  .  2  .  31 

A  =  L.  3  .  J  Q  =  L.4  .  3J 


With  the  above  data,  linear  program  {11)  becomes 


Max  u„ 


U3  ”  xsl  ^  1  ‘  •  2s>  “  xs2  3  "  *  4s>  5  0 


us  -  xsi  (•  2  -  •  3s)  -  XS2  (•  1  -  •  3s)  s  0 


x  .  ,  X  2£  0 
si  s2 


To  apply  our  method,  we  need  an  initial  value  of  s.  The  bounds 


from  equation  (20)  for  this  example  are 


-i  =  m  £  v  £  M  =  ~ 
3  4 


We  choose  our  initial  value  of  s  between  the  above  bounds;  and  for  con- 
venience,  we  try  3=4.  The  resulting  optimal  solution  to  (21)  is 

w 


V  0-025'  *i4  vl 


Now  we  want  to  choose  s  to  get  ua  £  0.  From  the  discussion  following 
equation  (20),  we  have  £  0  ;  and  for  convenience,  we  select  s  =  .  7. 

The  rfi flnlt.in cr  rmtimal  tn  ^2*'  is 

-  *  * -  O  *  X*  '  ■  -------  \  / 


A  a  Ort  n  ^  13  A  3 

u  7=  -0.  288,  Xj>TC,  x2  =  ^ 
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We  use  linear  interpolation  between  s  =  .  5  and  s  =  .  7  to  approximate 
the  value  of  s  which  gives  ug  =  0,  i.  e.  , 

s  =  .  5  +  (.?.)  =  0.  593 

L,IZ> 

Now  u  2  =  0,  we  conclude  that  v  -  .593.  We  round  v  off  to  .  6  and 
solve  problem  (21)  with  s  =  .  6  to  obtain  the  following  optimal  strategies 

*  7  *  2 

X1  =  9  ’  X2  =  9 

-  5*4 

n  =  9 '  yz  =  9 

In  this, example,  PI  can  detect  P2  with  probability  at  least 
v  =  .  593  by  playing  optimally.  Of  course,  P2  can  prevent  PI  from  ob¬ 
taining  a  larger  probability  of  detection  than  .  593  by  also  playing 
optimally. 

We  may  compare  this  solution  to  the  solution  of  the  finite  game 
with  the  same  payoff  matrix  and  stop  probability  matrix.  (See  section 
3.  4. )  The  probability  that  PI  detects  P2  in  at  most  three  steps  was 
Vj  =  .  3o2.  Also,  when  the  game  lasts  one  step,  we  have  an  n-cell 
game  of  Chapter  2.  In  this  case,  the  probability  of  detection  is  .  167. 
This  completes  the  discussion  of  the  example,  and  we  turn  to  a  special 
case  of  the  most  general  game. 


t 

v 
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3.  8  A  Special  Case:  Minimax  the  Expected  Duration  of  the  Game 

We  investigate  the  special  case  when  PI  seeks  to  minimize  the 
expected  duration  of  the  game  and  P2  seeks  to  maximize  it.  To  obtain 
this'1  objective,  we  must  take  a^  =  1  (all  i,  j).  Then,  from  equation  (7), 
the  expected  accumulated  payment  received  by  PI  is  the  expected 
duration  of  the  game.  With  all  a..  =  1  ,  equation  (8)  becomes 

“J 

v(X,  Y)  =  — L- 
X*QY 


We  want  PI  to  be  the  minimizing  player  so  we  seek  to  solve  the 


equation 

(22) 


mm  max 
X  Y 


1 


1 


X*QY  XlQY 

Clearly,  v,  X,  and  Y  satisfy  (22)  if,  and  only  if,  they  satisfy 
(23) 


l 

v 


max  min  ^ Qy  = 


Hence,  PI  can  minimax  the  expected  duration  of  the  game  by  maximining 
the  probability  that  the  game  terminates  in  one  step. 

We  can  solve  equation  (23)  by  the  following  familiar  linear 
programming  formulation  of  a  matrix  game 

Max  u 

uel  ■  XlQ  s  0 

(24) 

X*e  =  1 


X  a  0 


97 


A  ■>  - 
L,«..t  u,  X  be  an  optimal  solution  to  (24)  and  i  a  dual  optimal  strategy. 

Then 

u  =  i^QY  =  max  min  XtQY 
X  Y 

and  from  the  equivalence  of  the  optimal  solutions  to  (22)  and  (23) 


1  _  min  max  1  _  1 

u  X  Y  xtQY  x<:QY 

This  is  the  desired  solution  to  the  infinite  sequential  game  when  the  ob¬ 
jective  is  to  minimax  the  expected  duration  of  the  game. 

We  solve  the  example  of  the  last  section  when  minimax  the  ex¬ 
pected  duration  is  desired.  Here 


3.  9  A  Stop  Strategy  and  Dominance 


A  feature  that  can  easily  be  included  in  the  sequential  games  is 
a  stop  strategy  for  PI,  which  will  allow  PI  to  terminate  the  search  if 
he  so  chooses,  in  the  finite  game,  this  option  can  be  included  by 
simply  adjoining  an  additional  row  (n  +  1)  to  each  of  the  matrices  Ar 
and  P  .  Since  row  n  +  1  is  to  be  a  stop  strategy,  we  require  that  row 
n  r  1  of  the  matrix  Pr  contains  all  zeros.  By  solving  this  new  game, 
we  find  the  moves  for  which  PI  chooses  row  n  +  1  with  positive  proba¬ 
bility  or  zero  probability.  In  this  way,  we  will  have  an  optimal 
stopping  rule  for  the  game.  We  do  not  pursue  this  point  further,  since 
the  moves  for  which  Pi  employs  his  stop  strategy  with  probability  zero 
will  depend  on  the  specified  data  (Ay  and  Pr  ) . 

We  also  apply  the  idea  of  3  stop  strategy  to  the  infinite  sequential 
game.  Again,  we  adjoin  an  additional  row  n  +  1  to  the  A  and  P  matrix, 
with  row  n  +  1  of  the  P  matrix  all  zeros.  We  could  establish  a  sufficient 
condition  for  PI  to  choose  row  n  +  1  with  probability  zero.  In  this  case, 
PI  will  allow  the  game  to  terminate  by  the  already  specified  means  and 
PI  will  not  abandon  the  search  at  any  move.  To  obtain  this  sufficient 
condition,  we  would  require  the  notion  of  dominance  for  the  infinite  se¬ 
quential  game.  Dominance  in  the  sequential  game  is  equivalent  to  ordi¬ 
nary  dominance  in  the  equivalent  two-person  zero-sum  game  with  payoff 
matrix  A  -  vQ.  The  desired  result  easily  follows  from  this  notion  of 


dominance. 


CHAPTER  IV  -  TACTICAL  STOCHASTIC  GAMES1 


4.  1  Introduction 

(a)  The  Problem 

This  chapter  is  concerned  with  the  development  of  models  and 
methods  for  finding  optimal  tactics  in  an  idealization  of  Antisubmarine 
Warfare  (ASW).  We  view  the  ASW  problem  as  a  game  of  pursuit  be¬ 
tween  the  hunter-killer  force  (player  1)  and  a  possible  submarine 
(player  2).  The  pursuit  begins  with  a  contact  which  is  an  indication  of 
a  possible  submarine  by  the  sensors  of  one  or  more  units  of  the  hunter - 
killer  force.  The  pursuit  ends  when  the  contact  is  "caught"  or,  in  some 
cases,  evades  the  hunter-killer  force,  A  catch  may  correspond  to  the 
attainment  of  one  of  several  military  objectives  such  as  positive  identifi¬ 
cation  that  the  contact  is  or  is  not  a  submarine  or  sinking  of  the  sub¬ 
marine.  In  any  event,  a  catch  is  a  specified  terminal  condition  for  the 
pursuit. 

The  status  of  the  pursuit  at  every  move  t  (t  =  1,  2,  .  . . )  is  taken 
to  be  one  of  a  finite  number  of  possible  states,  A  state  summarizes  the 
tactical  information  which  is  available  to  both  players  for  decision 

^uch  of  the  work  in  this  chapter  is  also  contained  in  Charnes 
and  Schroeder  [1], 
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i  00 


making.  For  example,  each  state  may  correspond  to  one  of  a  finite 
number  of  possible  configurations  of  the  hunter-killer  forces  which  may 
hold  the  contact  on  their  sensors.  Then,  at  every  move  t,  each  player 
determines  the  state  of  the  pursuit  by  observing  the  configuration  of 
the  hunter-killer  forces  which  are  holding  the  contact.  Thus,  a  finite 
collection  of  states  numbered  i  =  1,  . . .  ,n  is  specified.  When  the  pur¬ 
suit  has  not  terminated,  it  must  be  in  one  and  only  one  of  these  states 
at  each  move. 

*1 

The  structure  of  the  problem  also  includes  a  finite  collection  of 
tactical  plans  (decisions)  associated  with  each  state.  A  plan  specifies 
the  tactics  which  a  player  will  use  until  his  next  move.  In  the  most 
general  case,  we  assume  that  the  players  simultaneously  choose  a  plan 
after  the  state  of  the  pursuit  is  observed.  When  the  pursuit  is  in  state  i, 

we  number  the  available  plan9  k  *  1 . M.  and  h  =  1,  ...  ,  hT  for 

players  1  and  2  respectively.  When  the  players  move,  they  each  choose 
a  plan  and  thereby  jointly  determine  an  immediate  "payoff"  from  player 
2  to  player  1  and  a  transition  probability  distribution  over  the  states. 
Before  the  next  move  is  made,  the  game  transits  to  one  of  the  states  or 
terminates  according  to  the  chosen  probability  distribution.  *  We 
assume  that  the  game  is  zero  sum. 

*Each  move  which  we  consider  consists  of  Doth  a  personal  and 
chance  move  in  the  sense  o?  von  Neumann  and  Morgenstern  [1], 
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We  consider  different  payoffs  corresponding  to  different  ASW 
objectives.  Two  reasonable  ASW  objectives  for  player  1  are: 

(1)  Minimize  the  expected  duration  of  the  game. 

(2)  Maximize  the  probability  of  a  catch. 

In  case  (1),  the  payoff  for  every  pair  of  state  and  plan  is  the  time  taken 
by  one  move.  Or,  the  hunter -killer  force  wishes  to  catch  the  submarine 
in  minimum  time.  With  objective  (2),  we  must  have  at  least  two  terminal 
conditions  for  the  pursuit.  For  in  this  case,  the  hunter-killer  force 
attempts  to  maximize  the  probability  of  catching  the  submarine  and  is 
faced  with  the  possibility  that  the  pursuit  may  terminate  with  conditions 
other  than  a  catch. 

In  short,  the  problem  consists  of  a  finite  collection  of  states 

which  summarizes  the  tactical  information  available  to  both  players.  At 

each  move,  the  players  observe  the  state  of  the  game  and  each  player 

chooses-  a  tactical  plan  from  a  finite  collection.  The  chosen  tactical 

plans  jointly  determine  an  immediate  payoff  and  a  transition  probability 

distribution  over  the  states.  Before  the  next  move  is  made,  the  game 

transits  to  one  of  the  states  or  terminates  according  to  the  chosen  proba- 

« 

bility  distribution.  Our  task  is  to  find  an  optimal  strategy  for  each 
player.  A  strategy  is  a  decision  (possibly  randomised)  for  each  state  and 
move.  An  optimal  strategy  is  one  of  a  minimax  pair  for  the  total  ex¬ 
pected  payoff.  For  convenience,  unless  otherwise  noted,  we  shall  take 
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player  1  to  be  the  maximizing,  and  player  2  the  minimizing,  player. 

(b)  The  Models 

To  describe  the  above  ASW  situation,  we  consider  a  basic 
model  and  four  variants.  The  basic  model  is  a  stocha.stic  game  due  to 
Shapley  [1],  We  call  this  game  a  Terminating  Stochastic  Game  (TSG) 
to  distinguish  it  from  the  non-terminating  variant  introduced  by 
Hoffman  and  Karp  [  1].  Shapley  defined  a  vector  value  for  a  TGS  and 
employed  an  ingenious  argument  to  establish  its  existence  and  that  of 
optimal  strategies.  The  methods  and  representations  he  employed  were 
of  a  nonlinear  character.  We  show,  however,  that  linear  programming 
can  be  used  to  characterize  the  value  of  the  game  and  its  optimal 
strategies  as  well  as  to  obtain  them  to  within  a  desired  degree  of  ap¬ 
proximation.  In  addition,  we  determine  the  effect  of  near -optimal 
strategies  on  the  total  expected  payoff  for  the  TSG. 

Next,  we  discuss  two  variants  of  the  TSG  which  lend  consider¬ 
ably  more  realism  to  the  game  for  ASW  purposes.  The  first  involves 
a  modified  assumption  on  the  transition  probabilities  from  that  employed 
by  Shapley.  No  change  in  the  solution  techniques  developed  for  the 
basic  game  is  required  by  this  modification  although  it  enlarges  the 
class  of  problems  which  may  be  solved.  The  second  variant  involves  an 
extension  of  the  notion  of  a  constrained  game,  Charnes,  to  stochastic 
games  and  is  exemplified  in  a  particular  type  of  "constrained"  TGS. 


Here  an  implicit  restriction  on  the  duration  of  the  game  is  rendered  by 
means  of  constraints  on  the  strategies. 

Another  description  of  the  ASW  situation  may  be  obtained  from 
a  TSG  with  perfect  information.  W~  discuss  its  advantages  in  describing 
the  ASW  problem  and  exhibit  a  linear  program  whose  solution  yields 
the  value  and  optimal  stn..  gies  for  a  general  TSG  with  perfect  infor¬ 
mation.  The  exitv-nce  and  uniquei.ess  of  the  value  is  also  established 
directly  from  this  linear  program. 

Finally,  we  introduce  a  finite  version  of  a  TSG.  This  finite 
TSG  is  applicable  to  the  ASW  situation  when  the  pursuit  is  known  to 
terminate  in,  at  most,  a  finite  number  of  steps.  This  finiteness  allows 
us  to  relax  certain  assumptions  which,  are  required  in  the  infinite  case 
and,  thus,  additional  realism  can  be  introduced  into  the  model.  Again, 
however,  our  basic  linear  prograSnming  techniques  hold  good  and  yield 


constructive  procedures. 
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4.  2  Formulation  of  a  Terminating  Stochastic  Game 

In  this  section,  we  define  the  TSG  and  present  two  basic  theo¬ 
rems  due  to  Shapley.  A  TSG  is  played  in  a  sequence  of  moves.  At 
each  move,  the  game  is  said  to  be  in  one  of  a  finite  number  of  states 
numbered  i  =  1,  .  .  .  ,  n .  If  the  game  is  in  state  i  (i  =  1,  .  .  .  ,  n)  and 
player  1,  the  maximizing  player,  chooses  alternative  k  and  player  2 
chooses  alternative  h,  then  the  payoff  to  player  1  from  player  2  is 


akh 

l 


k  =  1 . Mi 

h  =  1,  ...  ,N. 


Since  we  have  assumed  the  game  is  zero  sum,  player  2  receives,  of 
course,  -  akk.  The  choice  of  alternatives  k  and  h  also  determines  the 
transition  probabilities: 


j  -  1,  ....  n 
kh 

p. .  a  0  k  =  1,  .  .  .  ,  M. 

*ij  i 

« 

h  =  1,  ...  ,N. 

where  p^k  is  the  conditional  probability  that  the  game  will  be  in  state  j 
on  the  next  move  given  that  it  is  in  state  i,  and  that  strategies  k  and  h 
are  chosen  by  player s  1  and  2  respectively.  Hereafter,  if  the  range  of 
the  subscripts  i,  j,  k,  h  is  omitted,  their  full  range  is  intended. 

We  assume: 
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(2)  (ii) 


kh 


a.  <  M  all  k,  h,  i 
x 


Under  these  assumptions,  the  game  terminates  with  probability  one  and 
the  accumulated  payments  received  by  either  player  are  bounded.  To 

,  ,  ,  kh  n  kh  kh 

verify  this  statement  we  let  s  =  1  -  £  p  >0,s  is  the 

i  ,  ii  i 

J=1 

positive  probability  of  termination  given  state  i  and  decisions  k  and  h. 
Let 
(3) 


min  kh 
s  =  .s. 

i,  k,  h  i 


Now  the  probability  that  the  game  does  not  terminate  in  N  moves  is  not 
more  than  (1  -  s)^.  Since  this  quantity  tends  to  zero  as  N  increases 
without  limit,  the  game  terminates  with  probability  one.  The  accumu¬ 
lated  payments  received  by  either  player  are  bounded  by 


M  +  (1  -  s)  M  +  (1  -  s)2  M  +  .  .  .  =  —  . 

8 

A  strategy  for  a  move  could  depend  on  the  entire  previous 
history  of  the  game  play.  Fortunately,  it  is  only  necessary  to  consider 
"behavior  strategies"  {stationary  strategies),  since  the  optimal  strate¬ 
gies  are  found  in  this  class,  Kuhn  [1],  Shapley  [1], 

Def.  A  behavior  strategy  X  for  player  1  is  an  n-tuple  of  proba¬ 
bility  distributions  X  =  {X^,  ....  X^) ,  each  X^  =  (xj,  .  .  .  ,  xM* ). 

A  similar  definition  holds  for  player  2. 


& 
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If  player  1  uses  a  behavior  strategy  X,  he  chooses  the  mixed 

strategy  whenever  the  game  is  in  state  i  regardless  of  what  move 

it  is  or  of  the  manner  of  arrival  at  state  i. 

By  choosing  a  starting  state  i,  we  obtain  an  infinite  *  game  G- 

(i  =  1,  .  .  .  ,  n),  A  TSG,  G,  is  defined  as  the  collection  of  games 

G  -  (G  ,  ...  ,  G  ).  Let  w  denote  the  value  of  G  ,  the  minimax  of  its 
1  n  i  i 

total  expected  payoffs.  Now  we  define  the  value  of  G  to  be  the  vector 

W  s  (w  ,  ...  ,  w  ). 

1  n 

We  introduce  a  two-person  zero-sum  game  with  payoff  matrix 

it. 

(v)  where  A^  (v),  is  1 . n,  is  the  M^xN.  matrix  whose  k  -  hc 

element  is 


(4) 


n 

=  Z 


kh 


v. 

3 


and  v  is  the  n-vector  of  real  numbers  v  -  (v  ,  .  .  .  ,  v  ).  Finally,  let 

1  n 

Val  [B]  denote  the  minimax  value  of  the  two-person  zero-sum  game 
with  payoff  matrix  B  and  let  X[B]  and  Y[B]  denote  the  sets  of  optimal 
mixed  strategies  for  players  .1  and  2  respectively.  Now  we  state  two 
basic  theorems  due  to  Shapley  [1]. 


1 


The  number  of  moves  may  not  be  bounded. 


( 
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Theorem  1  (Shapley):  The  value  of  the  terminating  stochastic 
game  G  is  the  unique  solution  W  of  the  nonlinear  system  of  equations 
(5)  w.  =  Val  [  Ai  ( W)  ]  i  =  1 ,  .  .  .  ,  n 

A  A 

Theorem  2  (Shapley):  The  behavior  strategies  X,  Y,  where 

X  *  X.  [  Ai  ( W)  ]  ,  Y\  *  Y.^  [  Ai  ( W)  ]  i  =  1 . n ,  ar e  optimal  for  the 

first  and  second  players  respectively  in  every  game  belonging  to  G. 
These  theorems  provide  a  basis  for  the  results  of  the  following  section. 
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t 

\ 


4.  3  Solution  of  a  TSG 

In  this  section,  we  develop  an  iterative  technique  which  employs 
a  contraction  mapping  whose  unique  fixed  point  is  the  value  of  the  game. 
This  mapping  is  applied  recursively  from  a  selected  starting  point, 
and  each  iteration  of  the  mapping  is  obtained  by  solving  a  set  of  linear 
programs.  Truncation  of  the  recursive  technique  yields  near -optimal 
strategies  of  the  TSG,  and  we  can  determine  in  advance  the  effect  on 
the  total  expected  payoffs  when  such  strategies  are  to  be  used. 

In  order  to  define  the  contraction  mapping,  consider  the 
n-dimensional  real  vector  space  Rn  with  the  norm 


I 
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v(t  +  1)  =  Tv(t) ,  t  =  0,  1,  . .  . 


Then,  by  the  definition  of  T 


vj(t+  1)  =•  Val  [  (v(t))  ]  , 


i  =  1,  ...  ,  n 


t  =  0,  1,  .  .  . 


By  the  contraction  property  of  T,  the  sequence  {v(t) }  converges  to 

A 

W  for  every  fixed  v(0).  (See  Kolmogorov  and  Fomin  [1]. )  Note  that 
if  we  choose  v(Q)  =  0  ,  then  v(N)  is  the  value  of  the  TSG  which  is 
truncated  (stopped)  after  N  moves,  if  it  lasts  that  long.  We  shall  re¬ 
turn  to  this  point  later. 

.  Now  the  sequence  (v(t)}  may  be  computed  by  linear  programming. 

tVi 

Indeed,  the  iLn  program  in  the  following  collection  is  a  linear  program¬ 
ming  formulation  of  the  game  Ai(v(t)),  where  v(t)  is  known.  1 
Accordingly,  the  optimal  solution  u.  (t)  exists  and  satisfies 
u.(t)  =  Val  [A.(v(t»],  i  =  1,  ...  ,  n . 


L.P.  (i,  v(t)) 


Max  u^(t) 


Subject  to: 


i  =  1,  ...  ,  n 


t  =  0,  1,  .  .  . 


i  k , ,  „  .  kh  n  kh 


u  (t)  -  £  x  (t)  (a  +  £  p  v 
k=l  j=l  J  3 


t  (t))  SO,  h  =  ) . .  N. 


I'  > 


*This  formulation  is  a  variant  of  that  in  Charnes  [1]  which  has 
the  same  advantage  that  the  dual  programs  correspond  precisely  to  the 
play  problems  of  the  respective  players. 


no 


M; 

L  x^(t)  =  1 
k=l 

x^(t)  ifl,  k  =  1,  ...  ,  M. 

A 

Given  v(t),  we  compute  U(t)  =(u  (t) . u  (t))  from  the  above  linear 

1  T1 

A 

programs  and  set  v(t  +  1)  =  U(t).  In  this  manner,  the  sequence  {v(t)} 
for  given  v(0)  is  generated. 

When  computing  the  sequence  {v(t)},  it  is  desirable  to  have  a 

A 

stopping  criterion  which  insures  a  desired  approximation  to  W.  More 
precisely,  given  arbitrary  <  >  0  ,  we  will  find  an  integer  N  such  that 

A 

||  W  -  v(N)  ||  5  «.  Returning  to  the  contraction  mapping  T  and  recalling 

A  A 

that  TW  =  W,  we  have  from  equation  (?) 

(10)  ||  W  -  TN+1  v  |1  £  (1  -  s)  j|  W  -  TNv  ||,  all  v  c Rn  . 

Also,  by  the  triangle  inequality 

II  W  -  TNv||  £  II  W  -  TN+1  V  II  +  II  TN+1  v  -  TNv|| 

Thus  S|  W  -  TNv||  s  (1  -  s)  ||  W  -  TNv||  +  ||  TN+1  v  -  TNvl|,  and 

(11)  s  ||  W  -  TNv  i|  s  !|  TN+1  v  -  TNv  II  ,  all  v  •  Rn  . 

But,  by  definition 

Tm  v(0)  =  v(m) ,  m  =  0,  1,  .  .  . 

II  W  -  v (N)  ||  ^  I  ||  v(N+  1)  -  v (N)  'll  . 
s 


Therefore 


Ill 


If  we  compute  the  sequence  {v(t)}  until  ||  v(N  +  1)  -  v(N)  jj  £  st,  then 

A 

|!  W  -  v(N)  ||  £  c.  The  actual  number  of  iterations  required  will  depend, 
in  general,  on  ||  W  -  v(0)  j|,  c  and  s. 

One  may  also  easily  bound  by  R  the  maximum  number  of  iter¬ 
ations  required  after  one  iteration  is  computed.  For,  observe  that  for 
every  integer  mil 

(12)  s  jj  W  -  Tmv  ||sj|  Tm+1  v  -  Tmv  '||  £  (1  -  s)m  |j  Tv  -  v  || 

Now,  after  v(l)  xs  computed  from  /(0),  choose  R  such  that 

II  v(l)  -  v(0)  ||  *  « 

Then  j|  W  -  v(R)  j|  £  «. 

By  the  inequalities  in  (12),  R  2  N  where  N  is  the  stopping  point  obtained 
by  the  methods  of  the  preceding  paragraph.  We  conclude  that  after  one 
iteration  we  will  have  an  upper  bound  on  the  total  number  of  iterations 
required  for  given  accuracy. 

Next,  we  investigate  the  effect  of  near -optimal  strategies  on  the 

9.  i  M; 

total  expected  payoff.  As  before,  let  u^(N)  and  X^(N)  =(x^(N),  . . .  ,  jL  (N)) 
be  an  optimal  solution  to  L.  P.  (i,  v(N))  and  assume,  given  «  >  0,  that 
N  is  cl  •'sen  such  that  ||  U(N)  -  v(N)  ||  £  s  •  where 

U  (N)  =  (Uj(N),  ..  .  ,un(N)).  Then  from  equation  (11),  |jtJ (N)  -  W  jj  £  «. 

A 

Let  Y^(N)  be  an  optimal  strategy  for  player  2  in  the  game  (v(N);. 
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Then  Y^N)  is  an  optimal  strategy  in  the  dual  to  L.  P.  (i,  v(N))  and  is, 
of  course,  at  hand  when  the  direct  problem  is  solved.  (See  Charnes  [  1]. ) 
Let  U  =  (Uj,  .  .  .  , u  )  be  the  expected  payoff  in  the  TSG  when  the  known 

A  A 

strategies  X(N)  and  Y  (N)  are  used  in  every  move  of  G.  We  wish  to 

A 

find  the  difference  in  norm  between  U  and  the  value  of  G,  W.  First, 

A 

we  compute  the  difference  in  norm  between  U  and  U(N).  Let 
Pij  =  S  x.  (N)  p..  y.  (N)  and  a.  =  £  x.  (N)  a.  y.  (N) . 


Then  U  is  given  by  the  solution  to  the  system 


^  =  a  +  £  p  u.>  i  =  1 . n  . 

j=i  J 


This  solution  ‘  unique  since  0  S  p. .  <  1,  all  i,  j.  Now  U(N)  is  related 
to  v(N)  by  the  linear  programs  L.  P.  (i,  v(N)) ,  i  =  1,  .  .  .  ,  n.  From 
primal-dual  considerations 


u.  (N)  -  a.  -  2  P-.  v. 
1  i  J 


( N)  =  0  ,  i  =  1,  ...  ,  n . 


Subtracting  equation  (14)  from  equation  (13),  we  obtain 


u.  -  u.  (N)  =  2  Pi:  (u.  -  v.  (N)),  i  =  1,  ...  ,  n , 
11  j=l  J  J 

A 

From  our  assumption  ii  U(N)  -  v(N)  ||  S  s«,  we  may  write 


(N)  -  v.  (N)  =  si.,  j  =  1,  .  .  .  ,n  with  jc.  j 

J  J  J  J 


*  t 
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A 

Then  the  desired  relationship  between  U  and  tJ(N)  is 


u  -  u  (N)  =  r  p.  (u  •  5  (N)  f  st  ),  i  =  1,  ...  ,n. 
i  i  -•_»  *1  J  J  J 


i=l 


Further 


j  u.  -  u  (N)  |  £  £  Py  |  u.  -  Gj  (N)  |  +  £  py  s  |  c  |  ,  i  =  1,  . 
j=l  J  J  j  j=l  J  j 


.  ,  n 


n 

and  since  £  p,,S(l  -  s),  i  =  1 ,  .  .  .  ,  n ,  we  have 

j=l  1J 

liu  -  U(N)|j  £  (1  -  S)  li  U  -  U(N)|j  +  s(l  -  s)« 
ii  u  -  U(N)jj  £  (i  -  3), 

A 

Finally,  the  difference  in  norm  between  U  and  W  is  bounded  by 
|| U  -  W j]  £  jiu  -  U(N)'||  r  j|  U(N)  -  W|j  £  (1  -  s)«  +  «  =  (2  -  s)c 


From  the  above  equation,  we  see  that  one  can  find  a  priori  an  integer  N 

A  A 

such  that  the  behavior  strategies  X(N)  arid  Y  (N)  can  be  used  in  the  TSG, 

A 

G;  and  the  total  expected  payoff  obtained  will  be  as  close  to  W  as 
prescribed. 

We  summarize  the  results  of  this  section  with  the  following 
Theorem  3:  Let  the  sequence  {v(t)}  be  defined  by  equation  (9) 

A 

and  let  W  be  the  value  of  the  TSG.  For  given  *  >  0,  define  N  as  the 
smallest  integer  for  which 


|j  v ( N  +  1 )  -  v ( N)  jj  £  sc 
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then 

(i)  ii  w  -  v(N)  ii  s  c 

A  A 

AI30,  let  (N)  and  Y  (N)  be  optimal  strategies  for  PI  and  P2  re¬ 
spectively  in  the  game  Ai  (v(N)),  i  =  1,  .  . .  ,n,  and  let  U  be  the 
accumulated  payoff  received  by  PI  when  these  strategies  are  used 
in  every  move  of  the  TSG,  then 

<ii)  ||  U  -  W  j|  £  (2  -  s)  « 

To  recapitulate  in  part,  we  have  defined  a  nonlinear  contraction 
mapping  T  whose  unique  fixed  point  is  W.  We  have  shown  how  to  re¬ 
place  the  fixed  point  problem  by  optimizing  a  linear  programming  formu¬ 
lation.  In  this  way,  the  successive  terms  of  the  sequence  [T^v]  were 
computed  and  a  stopping  criterion  was  developed  which  insured  the  de- 

A 

sired  approximation  to  W.  Finally,  the  linear  programs  L.  P.  (i,  v(t)), 

A  A 

(i  =  1,  .  .  .  ,  n)  yielded  a  dual  pair  of  optimal  strategies  X{t),  Y  (t)  and 
wc  obtained  the  effect  on  the  total  expected  payoff  when  these  strategies 


are  used  in  the  TSG. 
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4.  4  Another  Solution  Method 

In  this  section,  we  develop  another  iterative  method  to  compute 
the  value  and  optimal  strategies  for  a  TSG.  This  method  is  closely- 
related  to  one  proposed  by  Hoffman  and  Karp  [1]  for  nonterminating 
stochastic  games,  and  it  is  also  an  extension  of  Howard's  [1]  policy 
iteration  method  to  stochastic  games.  One  iteration  of  our  method 
consists  of  starting  with  a  strategy  for  PI  and,  in  a  certain  way,  com¬ 
puting  a  new  strategy  for  PI.  Thus,  the  method  iterates  on  strategies 
for  PI  as  opposed  to  the  method  of  the  last  section  which  iterated  on  the 
"state  values". 

Next,  we  describe  the  method  and  then  establish  some  properties 
of  the  quantities  which  are  generated  on  successive  iterations. 

Method  II 

1.  Choose  a  behavior  strategy  X(0)  =  (X  (0),  X_  (0),  .  .  .  ,  X  (0)). 

I  m  n 

1  2  M; 

2.  Given  X  (t)  with  X.  (t)  =  {x  (t),  x  (t),  .  .  .  ,  x  ( t)) ,  i  =  1 ,  ...  ,  n, 

ill  i 

find  the  solution  to  the  system  of  equations 


(15)  w.(t)  =  m£n  ^(t)  [akh+  l  p^w  (t)] 
k=!  u  j=l  J  J 

(The  solution  W(t)  =  (w^  (t) . wn(t))  is  unique  and  may  be 

found  by  solving  a  linear  program  of  the  type  given  in 


section  4.  8.  ) 
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3.  Now  X(t+1)  =  (X  ^{t  +  1),  ...  ,  Xn(t  +  1))  is  determined  by- 
finding  an  optimal  strategy  for  PI,  X^(t  +  1),  in  the  games 
A,  (W(t)),  i  =  lk  ...  ,  n.  Return  to  step  2. 

We  show  that  the  sequence  {X(t)}  converges  to  an  optimal  strate- 

A  A 

gy,  X,  for  PI,  and  that  the  sequence  { W(t)}  converges  to  W,  the  value 
of  the  TSG.  First,  we  establish  the  following 

Lemma  1:  Successive  solutions  obtained  from  equation  (15) 

satisfy 

W(t  +  1)  a  W(t) 

(The  inequality  holds  component-wise  on  the  above  vectors.  ) 

Proof;  From  equation  (15), 


M- 

...  -  t-.1  k,.,  T  kh  .  "  kh  ...~1  h  =  1, 

'.(t)  *  L  x  (t)  a  +2  p  w  (t)  ’ 

1  k=l  1  k*  J  J  1  —  ** 


We  adopt  the  simplified  notation 


/v  v  \  *5  k  kh  h  .  . 

aq(Xit  Y^)  ~  2  2  xi  ai  Yi  i  =  1.  ....  n  and 

h=l  k=l 


a {X,  Y)  =  (a  (  Y  ) . a  (X  ,  Y  )) 

111  n  n  n 


N,  M- 

1  *  k  kh  h  .  . 


P..  (X.,  Y.)  =  L  £  x.  p~  y“  i,  j  =  1,  .  .  .  ,  n  . 
1  1  h=l  k=l  1  1J  1 


P(X,  Y)  is  the  nxn  matrix  P(X,  Y)  =  (p.  (X  ,  Y  )).  With  this  notation, 

XJ  1  1 

equation  (17)  becomes 

(18)  W(t)  s  a  (X(t),  Y)  +  P(X(t),  Y)  W(t)  all  strategies  Y 
According  to  the  proposed  method,  X..  (t  +  1)  is  an  optimal  strategy  for 
PI  in  the  game  A^(W(t)).  Let  Y^(t  +  1)  be  an  optimal  strategy  for  P2 
in  this  game.  Then  the  pair  of  strategies  X(t  +  1),  Y  (t  +  1)  satisfy  the 
following  saddle  point  condition 

09)  a(X,  Y(t+l))  +  P(X,  Y(t+l))W(t)  «  a(X(t+ 1),  Y)  +  P(X(t+ 1),  Y)  W(t) 

'  all  strategies  X,  Y 

We  set  Y  =  Y (t  +  1)  in  equation  (18),  X  =  X(t)  in  equation  (19),  and 
use  (18)  and  (19)  together  to  get 

(20)  W(t)  £  a(X(t+  1),  Y)  +  P(X(t+l),  Y)  W(t)  all  strategies  Y 

Let  Y  be  a  strategy  for  P2  which  yields  the  solution  to 

(21)  W(t+  1)  =  a(X(t+  1),  Y)  +  P  (X(t+  1),  Y)  W(t+  1) 

We  set  Y  =  Y  in  (20)  and  subtract  (21)  from  (20)  to  get 


(22) 
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W(t)  -  W(t+  1)  £  P(X(t  +  1),  Y)  [  W(t)  -  W(t+  1)] 

For  notational  convenience,  we  let  A  =  W (t)  -  W(t  +  1)  and 
P  =  P(X(t  +  1),  Y),  then  equation  (22)  may  be  written  as 

(23)  A  +  f  =  PA  where  f  a  0 

(I  -  P)A  = 

Since  all  the  elements  of  P  =  (p..)  satisfy  0  Sp..<  1,  (I  ~P)  ~  * 
exists  and  ail  its  elements  are  non-negative,  Hence, 

(24)  A  =  -(I  -  P)  '  1  §  £  0 
we  obtain 

W(t)  5  W(t  +  1) 

The  vectors  W(t)  are  in  Euclidean  n-space,  and  the  sequence 

{  W(t)}  is  monotone  increasing.  *  We  show  in  section  4.  8  that  the 

solution  W(t)  to  equation  (15)  is  bounded  from  above  for  all  t.  Hence, 

the  sequence  {W(t)}  converges  to  a  limit  W*.  Now,  it  is  clear  from 

Method  II  that  W  =  (w*  ,  .  .  .  ,  w* )  is  the  solution  to 

l  n ' 

w?  =  Val  A.  (W*)  i*l . n 

and,  therefore,  W  is  the  value  of  the  TSG  (theorem  l). 

We  consider  the  sequence  (X(t)}.  The  vectors  X(t)  vary  in  a 

1 

One  cay  verify  that  if  W (t)  =  W(t+  1),  then  W(t)  is  the  value. 
Hence,  W(t)  a!  W(t+  1),  with  strict  inequality  holding  for  at  least  one 
component,  unless  the  sequence  has  converged  to  its  limit. 
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compact  set,  and  we  may  extract  a  convergent  subsequence.  Let  X  be 
the  limit  of  such  a  convergent  subsequence.  From  section  4.  8,  W  (t) 
is  the  optimal  solution  to  a  linear  programming  problem.  By  the 
method  used  in  section  3.  6,  we  c?.i  establish  that  W(t)  is  a  continuous 

function  ofX{t).  Further,  X(t  +  1)  is  an  optimal  strategy  for  PI  in  the 

$ 

game  A.  (W(t)),  From  this  fact  and  continuity,  we  may  assert  that  X. 

jj{ 

is  an  optimal,  strategy  in  the  game  A.  (W  ).  Then,  from  theorem  2, 

X  -  (X, ,  .  .  .  ,  X  )  is  an  optimal  strategy  for  PI  in  the  TSG.  We  sum 
1  n 

up  with  the  following 

Theorem  4:  The  sequence  [  W(t)}  converges  to  the  value  of  the 
TSG  and  the  sequence  { X (t)}  converges  to  an  optimal  strategy  for  PI. 

This  completes  our  discussion  of  two  methods  to  approximate 
the  value  and  optimal  strategies  for  a  TSG.  In  the  remainder  of  this 
chapter,  we  investigate  extensions  and  special  cases. 


* 


120 


4  5  A  Modified  Assumption 

Throughout  the  discussion  on  the  TSG,  we  have  been  using  the 
assumption 

n  v>, 

(i)  £  p„  <  1  ,  all  i,  k,  h 

j=l 

In  this  section,  we  consider  a  siightly  weaker  assumption^  than  (i); 
all  other  definitions  and  assumptions  remain  unchanged.  For  con¬ 
venience,  we  shall  charge  the  notation  for  transition  probabilities  from 


kh  .  kh 

P.  q. .  . 

ij  iJ 


We  allow 


n 


(25)  £  q—  ^  1  (equality  may  hold  for  some  or  all  i,  k,  h) 

j=l  1J 


Thus,  we  permit  a  zero  probability  of  termination  before  the  next  move 
when  the  game  is  in  state  i  and  alternatives  k  and  h  are  chosen  by  players 
1  and  2  respectively.  However,  we  impose  the  following  regularity  con¬ 
dition  on  the  qkk . 

Assumption  A:  For  all  behavior  strategies  X  and  Y  for  players  1 
and  2  respectively,  the  game  terminates  with  probability  one  in  a  finite 
number  of  moves  from  every  state  i  (i  =  1,  ...  ,  n). 

Now,  if  assumption  (i)  is  satisfied,  then  assumption  A  is  trivially 


*See  also  Denardo  [1]  and  Derman  [1].  They  employ  this  weaker 
assumption  for  a  ''terminating  Markovian  decision  process". 
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satisfied. 

Assumption  A  asserts  that  it  is  possible  for  every  behavior 

strategy  to  find  a  finite  sequence  of  states  leading  from’ every  state  i  to 

termination  of  the  game  or,  to  put  it  another  way,  the  states  i  -  1 ,  ...  ,  n 

r  kh 

are  transient  for  every  behavior  strategy.  Let  q„  be  the  r-step 
transition  probability  from  state  i  to  state  j  when  decisions  k  and  h  are 
chosen  for  state  i  and  an  arbitrary  behavior  strategy  is  used  for  states 
other  than  i.  Assumption  A  guarantees  that  there  exists  an  integer  N 
such  that 


n 

L 

j=l 


N  kh 
q. .  < 


1  , 


all  i,  k,  h 


N  . 

Thus,  T  is  a  contraction  mapping  and  T  has  a  unique  fixed  point.  (See 
Kolmogorov  and  Fomin  [1],  )  It  follows  that  theorems  1  and  2  are  true, 
and  all  of  our  results  of  section  4.  3  are  valid  for  transition  probabilities 
satisfying  equation  (25)  and  assumption  A. 


! 
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4.  6  Interpretation  of  payoffs  in  ASW 

To  place  these  developments  in  context,  let  us  return  to  the  ASW 

situation  with  the  aforementioned  objectives:  (1)  minimax  the  expected 

duration  of  the  game,  and  (2)  maximin  the  probability  of  a  catch.  We 

now  seek  to  exhibit  appropriate  numerical  values  for  the  a^  which  will 

encompass  these  two  objectives. 

Consider  first  objective  (1)  and  assume  that  player  1  is  the 

minimizing  player.  Suppose  that  the  fixed  behavior  strategies 

X  =  {X  ,  .  .  ,  X  )  and  Y  =  ( Y  ,  .  .  .  ,  Y  )  are  used  by  players  1  and  2  re¬ 
in  In 

spectively  in  G.  These  fixed  strategies  X  and  Y  define  an  absorbing 
Markov  chain  with  transition  probabi*’: '--s. 


L 


Mi  Ni  k  kh  h 


(26)  p..  (Xi(  Y.)  =  £  £  x~p~“  y“,  i,  j  =  1, 

k=l  h=l 


n 


and  the  probability  of  absorption  in  one  move  given  state  i  is 


[•Sp„(X.,Y,)>0,  i  =  1,  ...  ,n 

li  i  i 

j=l  J 

Now  let  w  (X,  Y)  be  the  expected  duration  cf  G,  when  X  and  Y  are  used 
i  i 

in  G.  Then  the  following  relationship  obtains.  * 


*See  Parzen  [1] ; 


Pr  denotes  "probability". 
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w  {X,  Y)  =  1  •  Pr  {terminate  in  one  move  /  state  i } 

i 


+  £  Pr  {  go  to  state  j  /  state  i)  (1  +  w.  (X,  Y)) 


1  -  E  P..  (X.  ,  Y.)  +  £  P..  (X.,  Y.)  (1  +  w.  {X,  Y)) 
.  ,  11  1  1  .  .  11  1  1  1 

j.  1  J  j=l  J  J 


(27)  w.  (X,  Y)  =  1  +  £  p..  (X.  ,  Y.)  w.  (X,  Y),  i=l,...,n 

1  j  =  l  1J  1  1  J 


Therefore,  setting 


a.  =  1  ,  all  i,  k,  and  h 
i 


it  follows  from  equations  (4),  (5),  (27)  that  w^,  the  solution  to  equation 
(5),  is  the  minimax  expected  duration  of  G- .  We,  thus,  have  formulated 
objective  1. 

To  attain  objective  (2),  a  similar  analysis  shows  that  we  should 
define  (player  1  is  now  the  maximizing  player): 

(28)  a|C^1=  tbe  probability  of  a  catch  in  one  move,  given  i,  k,  and  h. 

Then  Wj,  is  the  maximin  probability  of  a  catch  for  .  Recall  that 
kh  n 

s.  =  1  -  S  p*.  is  the  probability  of  termination  in  one  move  given  i, 

1  j=l  1J 

k,  and  h.  The  probability  of  a  catch,  given  i,  k,  and  h,  can  be  no 


,  kh  .  kh  .  kh  T  kh  kh  .  ,  , 

greater  than  ;  thus,  Oi  a.  £  .  In  case  a.  =  s^  ,  all  i,  k,  h, 


aseweavrv*' 
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then  equation  (5)  has  the  tv  .  ial  solution  w.  =  1,  i  =  1,  .  .  .  ,  n,  i.  e.  , 
the  submarine  is  caught  with  probability  one  because  the  game  can  only 

terminate  with  a  catch.  If  af*1  <  s^*1  then  s^*1  -  a^  is  the  non-zero 

ii  ii 

probability  th' t  the  submarine  is  not  caught  in  one  move,  given  i,  k,  h, 
and  w.  s:  1  (i  =  1,  . . .  ,  n)  strict  inequality  holding  for  at  least  one  i. 

With  the  indicated  payoffs  (28),  the  hunter -killer  force  maximizes  and 
the  submarine  minimizes  the  probability  that  the  submarine  ic  caught. 
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4.  7  A  Constrained  TSG 

This  section  considers  a  constrained  TSG.  See  Charnes  [1] 
for  a  discussion  of  two-person  zero-sum  constrained  games  and  their 
reduction  to  linear  programming  problems.  By  a  constrained  game, 
we  mean  that  each  player's  strategies  are  implicitly  restricted  to  a 
convex  set  (usually  polyhedral)  rather  than  arbitrarily  chosen  from  the 
unit  simplex.  For  concreteness,  suppose  that  player  1  is  to  maximize 
the  total  expected  payoff  subject  to  a  constraint  on  the  expected  duration 
of  the  game.  Our  task  is  to  find  a  restriction  on  Pi's  strategy  which 
will  guarantee  that  the  expected  duration  of  the  game  is  no  greater  than 
a  specified  constant,  Cal.  Of  course,  other  types  of  constraints  can 
also  be  developed  by  employing  the  method  which  we  propose  here. 

As  before,  let  w.  (X,  Y)  be  the  expected  duration  of  G.  when  the 
fixed  behavior  strategies  X  =  (X^  ,  .  .  .  ,  Xn)  and  Y  =  (Y^  ,  .  .  .  ,  Yft)  , 
are  to  be  used  by  players  i  and  2  respectively  in  G.  Then,  w^  (X,  Y) 
is  the  unique  solution  to  the  following  system  (see  equation  (27)). 


Y.)  w.  (X,  Y), 


i  =  1,  .  .  .  ,  n 


i  26 


w  (X,  Y)  =  w.  (X,  Y) 

r  isiSn  1 


Then 


wr(X,  Y)  =  1  +  £  Prj(Xr,  Yr)w  (X,  Y)S1+  £  Prj(Xr>  Yr)wr(X,  Y) 
j=l  j=l 


0  *  £  p  .  (X  ,  Y.)  <  1 
.  ,  ri  x  i 
J=1 


(29)  X^fX.j  £  x^p^S  1  -  h  *  1,  ...  Nt,  £  =  1,  xks  0, 

j,k  J  C  k 

fixed  C  2  1}  i  =  1 ,  ...  ,  n . 

These  are  the  desired  constraints. 

We  now  show  that  Wj  (X,  Y)  £  C  for  all  X^c  X-  and  arbitrary 
strategies  Y^ ,  (i  =  1,  ..  ,  ,  n).  To  substantiate  this  claim  consider  an 
arbitrary  strategy  Y^ .  Then  for  all  Xj^ «  X^ 


5;  /v  v  »  -  k  kh  h  ,  ,  1 

£  p. .  (X.  ,  Y.)  •=  £  x,  p.,  y,  s  1  ,  l  =  1,  ...  ,  n 

.,ii  i  x  .  ,  ,  i  i  i  x  C 

j=l  J  j,  k,  h  J 


2  p  .  (X  ,  Y  )  i  1  -  i 
j,l  rJ  r  r  C 


In  particular 


127 


Therefore 

w.(X,  Y)  s  w  (X,  Y)  i - 1 - —  =  C,  i=l . n. 

Thus,  player  1  can  limit  the  expected  duration  of  G  to  be  no  greater 
than  C  by  always  choosing  a  strategy  from  2^  when  the  game  is  in  state  i. 

To  solve  for  optimal  strategies  and  the  value  with  the  additional 
restriction  on  the  expected  duration  of  the  game,  we  adjoin  the  following 
constraints  to  L.  P.  (i,  v(t)),  (i  =  1,  . .  .  ,  n). 


(30) 


£ 

k,  j 


With  these  additional  constraints,  there  may  be  no  feasible  solution  to 
L.  P.  (i,  v(t))  for  some  i.  However,  from  the  above  development, 
infeasibility  of  the  augmented  L.  P,  (v,  v(t))  for  some  i  means  that 
there  is  no  behavior  strategy  for  player  1  which  satisfies  the  restriction 
on  the  expected  duration  of  the  game.  This  holds  true  for  every  v(t) 
and  will,  therefore,  be  evident  at  the  first  iteration  when  t  =  0.  On 
the  other  hand,  if  player  1  does  not  have  a  behavior  strategy  satisfying 
the  requirement  on  the  duration  of  the  game,  then,  for  some  i,  L.  P. 

{i,  v(0))  will  be  infeasible.  Summing  up,  the  augmented  L.  P.  (i,  v(0)) 
is  feasible  for  each  i,  if  and  only  if  the  constrained  game  has  a  solution 
(a  value  and  optimal  strategies).  If  the  solution  exists,  it  may  be  found 
from  the  augmented  L.  P.  (i,  v(t))  and  the  iterative  technique  developed 
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in  section  3. 

A  few  comments  on  the  choice  of  the  constant  C  are  in  order. 
First,  we  have  required  Cal,  equation  (29).  If  C  <  1,  then  X^  =  0 
(i  =  1,  . .  .  ,  n)  and  the  augmented  L.  P.  (i,  v(t))  is  infeasible  for  all  i. 

This  implies  that  no  behavior  strategy  exists  for  player  1  ,  which  yields 
an  expected  duration  less  than  one  --an  obvious  fact.  Second,  we  may 
also  establish  an  upper  bound  on  C.  By  assumption  (i)  and  equation  (3) 

n  kh 

£  p. .  ss  1  -  s  ,  all  i,  k,  h 
J=1  * 

Thus,  for  every  behavior  strategy  X, 

£  x*pj*  £  i  -  s ,  all  i,  h 
j,k  1 

and  the  constraints  (30)  are  redundant  if  C  a  — .  Intuitively,  this  means 

that  no  behavior  strategy  for  player  1  yields  an  expected  duration  greater 

than  I  .  Therefore,  the  constraints  (30)  are  nontrivial  if  C  is  chosen 
8 


UC<i 

s 


from  the  interval 
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4.  8  A  TSG  with  Perfect  Information 

We  return  to  our  idealization  of  ASW.  In  this  section,  the 
hunter -killer  force  knows  or  is  willing  to  assume  certain  behavior  of 
the  submarine.  More  precisely,  we  assume  that  player  2  is  playing 
some  fixed  behavior  strategy  which  is  known  to  player  1;  thus,  the 
game  is  effectively  a  one-person  game.  For  instance,  the  hunter-killer 
force  might  assume  that  the  submarine  takes  evasive  action.  Another 
example  is  the  assumption  that  the  submarine  takes  evasive  action  but 
is  moving  toward  some  objective.  In  practice,  one  might  find  optimal 
tactics  for  the  hunter -killer  force  under  various  assumptions  about  the 
behavior  of  the  submarine  and  then  use  the  set  of  tactics  for  the  most 
plausible  behavior.  The  merits  of  this  approach  are: 

1.  The  analysis  is  greatly  simplified. 

2.  Less  data  is  required. 

3.  If  the  submarine  has  the  assumed  behavior,  the  total  ex¬ 
pected  payoff  will  be  at  least  as  great  as  in  the  two -person 
TSG. 

4.  All  of  the  tactical  information  available  to  the  hunter -killer 
force  can  be  used  in  the  state  specification.  (In  the  two- 
person  case  only  the  information  available  to  both  players 
can  be  used, ) 

5.  The  hunter -killer  force  has  an  optimal  pure  strategy. 

Of  course,  the  main  disadvantage  of  this  approach  is  that  the  hunter- 
killer  force  must  have  information  on  the  behavior  of  the  submarine  or 
be  willing  to  act  as  if  it  did  and  take  the  attendant  risks. 
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Evidently,  we  are  interested  in  a  TSG  with  perfect  information. 
Thus,  we  assume  that  player  2  uses  precisely  one  strategy  (pure  or 
mixed)  which  is  known  to  player  1.  Accordingly,  suppose  that  player  2 
uses  the  behavior  strategy  Y  =  (Y^  ,  ...  ,  Yn ) .  Define 


N-  N- 

k  1  kh  h  k  _  1  kh  h 

P  ZJ  P--  y.  anc*  a-  =  £  a.  y.  •  These  are  now  the  transition 

1J  h=l  1J  1  1  h=l  1  1 


probabilities  and  payoffs  for  player  1  in  the  TSG  with  perfect  information. 
In  this  game  with  perfect  information,  the  optimal  strategies  for  player  1 
are  pure  strategies  (von  Neumann  and  Morgenstern  [1]).  Thus, 

Shapley's  functional  equation  (5)  may  be  rendered  as 


(31) 


w  = 


max 
1  sk^M. 


[*, 


n 

+  £ 

j=l 


p..  W.l  , 
1J  J  J 


i  =  1, 


n 


This  functional  equation  is  one  of  a  much  larger  class  that  has  been 
shown  by  Charnes  [2]  to  be  amendable  to  linear  programming  analysis. 
By  means  of  a  linear  program,  we  establish  the  existence  and  uniqueness 
of  a  solution  to  equation  (31).  In  addition,  the  optimal  pure  behavior 
strategies  and  state  values,  w^ ,  may  be  computed  directly  from  the 
linear  program. 


Tn  rrttiyiPf'tinn  thp  Hfpvatnvo  /^1\  i «  4- 

-  -  -  -  -  -  —  ***•'  f  V  »*  \  -»■  J.  /  i*A  V  J.11  W  V 

of  Markovian  decision  processes.  The  first  may  be  called  a  termi¬ 
nating  Markovian  decision  process.  These  processes  have  been 
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studied  under  the  modified  assumption  of  section  4  by  Derman  [1J  and 
Eaton  and  Zadeh  [1].  Derman  obtained  a  linear  fractional  program  for 
a.  terminating  Markovian  decision  process.  This  linear  fractional 
program  can  be  reduced  to  a  linear  program  by  a  transformation  due  to 
Charnes  and  Cooper  [6],  The  resulting  linear  program  is  precisely 
equivalent  to  the  dual  to  problem  I  below.  Thus,  Derman’s  viewpoint 
is,  in  a  sense,  "dual"  to  the  approach  taken  here.  A  TSG  is  also 
equivalent  to  a  discounted  Markovian  decision  process.  For  such  a 
process,  one  must  solve  the  equations 


(31a) 


„  max 

wi  =  UksM. 


i 

where  0  s  |3  <  1  and  the 

In  our  notation,  we  take 


("aj  +  P  £  q^  w.l  ,  i  =  1 ,  .  .  .  ,  n 
j=l  ' 

k  ? 

q. .  are  transition  probabilities  with  £ 

1J  j=l 

k  k 

p. .  =  B  q. .  and  we  have 


=  1  . 


n 

E 

j=l 


=  P 


n 

E 

j=l 


P<  1 


Therefore,  a  TSG  with  perfect  information  has  precisely  the  same 
structure  as  a  discounted  Markovian  decision  process.  For  studies  on 
discounted  processes  see  Howard  [1] ,  Blackwell  [1]  and,  with  particular 
reference  to  equation  (31a)  and  linear  programming,  see  d'Epenoux 
[1],  Balinski[2],  and  Denardo  [1], 
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We  employ  the  following  linear  program  to  solve  equation  (31). 
Problem  I 

n 

(32)  Min  2  w. 

i=l 


(33) 


Subject  to 


n  k  k 

w.  -  27  p..  w.  a  a.  , 


n 

M. 

i 


As  may  be  noted,  the  functional  of  this  system  serves  to  drive  the 
values  of  w^  to  be  the  maximum  over  k  of  the  right-hand  side  of  (31). 
Other  functionals  serving  the  same  purpose  could  also  be  employed.  The 
following  two  lemmas  and  theorem  3  establish  that  the  optimal  solution 
to  problem  I  exists,  satisfies  equation  (31),  and  is  unique. 

Lemma  2:  An  optimal  solution  w  =  (Wj  ,  . . .  ,  wn),  to  problem  I 

exists. 


Proof;  It  is  sufficient  to  show  that  problem  I  has  a  feasible 
solution  and  that  its  functional  is  bounded  from  below. 

First,  let  w.  =  C,  is  1 . n,  C  is  a  constant.  Equation  (33) 

becomes 


C(1 


n  k.  k 

L  P--)  2  a- 
.  .  lJ  i 

J=1 


n  k 

But,  1  -  2  p..  >  0,  all  i,  k.  Thus,  we  may  choose  C  large  enough  to 

j=l  1J 

satisfy  all  of  the  above  inequalities  simultaneously  and  problem  I  has  a 
feasible  solution.  Let  w  •=  (w^,  . .  ,  ,  \v'n )  be  a  feasible  solution  to 
problem  I  and  suppose  that  wy  £  w.  ,  i  =  1,  . .  .  ,  n.  Then  must 
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satisfy  the  inequalities 


n 


n 


w  a  a  +  2  p  .  w.  ;>  a  4-  r  p  .  w 
r  r  j=1  *rj  j  r  *  prj  r 


and 


w  (1-  £  p^.)sa^,  k  =  1 ,  . . .  ,  M 
r  .  .  rj  r  r 

J=1 


Let  Q  =  mi,n 
i,  k 


l-2p.. 
.  ij 
J 


.  This  minimum  exists  by  assumptions  (i)  and 


(ii)  on  page  .  We  now  have 


w  2:  w  i 


iQ,  i  =  1 ,  ...  , n . 


Hence,  (32)  is  bounded  from  below  for  every  feasible  solution  to 
problem  I.  By  boundedness  and  feasibility,  problem  I  has  an  optimal 
solution.  . 

Lemma  3:  Every  optimal  solution  to  problem  I  satisfies  equation 

(31). 

Proof  by  contradiction:  Let  w  =  (v/} ,  . .  .  ,  w^)  be  an  optimal 
solution  and  assume  for  some  j,  say  1  =  r,  that 


k  k  a 

w  >  a  +  2  p  .  w. 
r  r  .  ,  ^rj  j 
j=l  J  J 


k  =  1,  ...  ,  M„ 
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Theorem  5:  The  optimal  solution  to  problem  I  exists,  it 

satisfies  equation  (31),  and  is  unique. 

Proof  of  uniqueness:  Assume  w  =  (w  ,  .  .  .  ,  w  )  and 
-  1  n 

w  =  (w^,  .  .  .  ,  )  are  both  optimal  solutions  to  problem  I.  From 

Lemma  2  and  this  assumption,  there  exists  a  set  of  integers  £ k { i)} , 
such  that 


....  k(i)  .  k(i)  .  , 

(34)  w.  =  a.  '  '  +  2  p-.  w. ,  i=l . n 

ii  .  ,  ij  J 

J=1 

w  must  be  a  feasible  solution  to  problem  I  for  the  set  (k(i)},  thus 

-  k(i)  .  t  Mi)  -  •  , 

(35)  w.  2  a.  +  2  Py  w;»  i=l,  ...,n. 

j=l  J  J 

Subtracting  (34)  from  (35),  we  obtain 

(36)  w.  ~  w.  2:  2  (w.  -  w.),  i  -  1,  .  .  ,  n  , 

i  i  j=1  iJ  J  J 

A 

Let  { k  ( i) }  be  the  set  of  integers  which  gives 


*  k(i)  ,  £  k(i)  .  .  . 

=  a.  '  +  2  p..  w.f  is  1 . n, 


i  i 


ij  J 


Then,  we  also  have 


n  i-  /  m 

1  * 


w.  -  w.  5  2  P  -  •  (w.  -  w.),  i  =  1 ,  .  .  .  ,  n 

i  i  py  j  r 
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Consider  two  cases. 

Case  (1):  Assume  w.,  -  w^  <  0  for  some  i  (i  =  1,  .  .  .  n). 

Then  inequality  (36)  is  not  satisfied  for  all  i  (i  =  1,  .  .  .  ,  n).  Therefore, 


w.  -  wi  s  0  (i  s  1 ,  ...  ,  n). 

Case  (2):  Assume  w  -  w.  >  0  for  some  i  (i  =  1,  .  .  .  ,n). 

i  l 

Now  inequality  (37)  is  not  satisfied  for  all  i  (i  =  1,  ...  ,  n).  Therefore, 
w.  =  w  s  0  (i  =  1,  . .  .  ,  n)  is  the  only  possibility  and,  indeed,  (36)  and 
(37)  are  both  satisfied  when  w.  -  w.  =  0  (i  =  1,  ...  ,n).  Hence,  w.  =  w^ 
(i  =  1,  ...  ,  n)  and  the  theorem  is  true. 


Since  the  solution  to  equation  (31)  is  unique,  we  conclude  that 
equation  (31)  is  solved  by  problem  I.  Now,  an  optimal  pure  strategy  for 
the  TSG  with  perfect  information  is  available  from  the  solution  to 
problem  1  or  its  dual.  There  may  be  more  than  one  optimal  pure 
strategy  since  the  right-hand  side  of  equation  (31)  may  be  maximized  for 
more  than  one  k  and  some  i.  However,  alternate  optima  for  the  dual  to 
problem  I  correspond  to  alternate  optimal  pure  behavior  strategies  and 
vice  versa.  Thus,  all  of  the  optimal  pure  behavior  strategies  are  avail¬ 
able  from  the  solution  to  problem  I  or  its  dual.  Finally,  one  would 
normally  solve  for  the  optimal  strategies  from  the  dual  to  problem  I 
since  it  has  less  constraints  than  problem  I  and,  therefore,  less 
computational  effort  is  required. 
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4.  9  An  Example  of  Optimal  Target  Approach 

We  give  a  tactical  example  of  a  game  with  perfect  information. 
Consider  a  situation  where  the  searcher  has  a  datum  point  of  last 
contact,  but  he  may  or  may  not  hold  the  contact  on  his  sensors  at  each 
decision  point  in  the  pursuit.  The  searcher  wishes  to  get  into  attack 
position.  The  states  for  this  pursuit  are  determined  by  twc  observed 
factors,  range  to  the  datum  and  classification  of  the  contact.  For  our 
purposes,  range  is  measured  in  three  increments,  1,  2,  3,  and  the 
classification  is  either  a  hold  (H)  or  lost  (L)  contact.  These  two  factors 
determine  six  states,  1H,  1L,  2H,  2L,  3H,  3L,  where  for  example, 

1H  means  the  searcher  is  at  range  increment  1  from  datum  and  is 
holding  the  contact.  When  the  pursuit  has  not  terminated,  it  must  be  in 
one  of  these  six  states.  In  addition,  we  specify  two  terminal  states,  a 
permanent  lost  state  (LQ)  and  a  successful  attack  state  (Sq).  The 
searcher  wishes  to  maximize  the  probability  of  arriving  at  state  SQ. 

Now  there  are  four  types  of  decisions:  attack  (A),  decrease  the 
range  by  one  increment  (D),  increase  the  range  by  one  increment  (I), 
and  stay  at  the  present  range  (S).  Not  all  of  these  decisions  are  allowed 
for  each  state.  For  example,  the  searcher  cannot  attack  when  the  con¬ 


tact  is  lost  (temporarily).  The  permissible  decisions  for  each  state 


and  the  transition  probabilities  are  given  in  Figure  3.  1. 
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We  briefly  review  the  theory  of  section  4.  8  in  order  to  formulate 
a  model  for  this  example.  Recall  that  pk  is  the  probability  of  tran¬ 
sition  to  state  j  given  state  i  and  decision  k.  We  number  the  non¬ 
terminal  states  i  =  1,  ...  ,6  and  let  W;  be  the  probability  of  termination 
in  state  Sq  starting  from  state  i.  From  section  4.  8,  there  is  a  policy 
(a  decision  for  each  state)  which  is  simultaneously  optimal  for  all 
starting  states.  Now,  for  each  fixed  policy,  ihe  probability  of 
absorption  in  state  SQ  is  the  probability  of  transition  to  Sq  in  one  step, 
plus  the  probability  of  going  to  some  state  other  than  SQ  and  then  being 
absorbed  from  there.  The  optimal  probability  of  absorption  is  then 
given  by 


(40) 


w. 

i 


max 

k 


6 

+  2 

j= I 


1,  ...  I 


6 


Notice  that  the  probability  o£  transition  to  state  L&  does  not  appear  in 

the  above  equation.  Further,  equation  (40)  is  equivalent  to  equation  (32) 
!c 

with  p.g  being  the  immediate  payoff  for  decision  k  and  state  i.  Hence, 


o 

(40)  may  be  solved  by  the  linear  program  (34)  and  (35).  Less  compu¬ 
tational  effort  is  required  to  solve  the  dual  of  (34)  and  (35),  and  we 
exhibit  this  dual  below. 


(41) 


Mj 

L  x.k~  L  S  p*  x  =  1  .  j  =  1,  ....  6 

k=i  i=i  k^l  l->  lk 


4  M.- 

Max  2  2  p-c  x.. 

i=l  k=l  lSo  lk 

M* 


xik*  0 
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In  the  preceding  linear  program,  the  variable  x^  corresponds  to 
state  i  and  decision  k.  As  we  have  shown,  the  optimal  positive  vari¬ 
ables  will  designate  an  optimal  policy.  (For  each  i,  exactly  one  of 
the  x^  will  be  positive.  )  Also,  notice  that  only  the  non-terminal  states 
are  included  in  (41). 

The  data  from  Figure  3.  1  is  arranged  in  the  following  tableau 
format  for  the  linear  program  (41). 


i 


i 


1 


t 

f 


\ 


k 

PiS  -5 
o 

.  3 

.  1 

X11  X2J  x22 

X31 

x32  X41 

x42  x43 

X51 

x52 

x6l 

x62 

1-.2  -.2 

-.  8 

-.4 

=  1 

-.3  1  -.6  1 

2 

-.4 

=  1 

-.  3 

1 

1  -.3 

-.3 

-.9 

-.  3 

=  1 

-.  6 

-.4  1 

1  -.6  1 

-.  1 

-.  6 

=  1 

-.  2 

1 

1  -.3 

-.3  =  1 

-.  8 

-.6 

i 

1-.7  =  1 

Figure  3.  2 

The  above  linear  program  was  solved  by  a  standard  code  on  the  CDC  1604 
computer.  The  resulting  optimal  policy  and  the  probability  of  a  successful 
attack  starting  from  each  state  (dual  variables)  are  shown  next. 


l 

* 


141 


State 


Optimal 

Decision 


Prob  of  a 
Successful  Attack 


1 


1H 

A 

.  962 

1L 

I 

.9 

2H 

A 

1 

2L 

I 

1 

3H 

D 

1 

3L 

S 

1 

The  optimal  policy  is  then  as  follows:  if  the  contact  is  held  at  range  1 
or  2,  then  attack;  if  the  contact  is  lost  at  range  1  or  2,  then  increase 
the  range  by  one  unit;  if  the  contact  is  held  at  range  3,  then  decrease 
the  range  by  one  unit;  and  if  the  contact  is  lost  at  range  3,  then  stay  at 
range  3. 

This  example  demonstrates  the  utility  of  the  models  presented 
in  this  chapter.  They  may  be  used  to  describe  tactical  situations  where 
the  searcher  has  some  information  on  the  position  or  status  of  the 
submarine.  Of  course,  the  inclusion  of  the  states  in  the  model  permits 
the  use  of  such  tactical  information. 

We  have  used  the  perfect  information  model  of  the  last  section  to 
describe  a  situation  with  two  terminal  states.  This  extension  was 

*The  numbers  in  this  example  are  somewhat  optimistic. 


possible  because  we  were  maximizing  the  probability  of  termination  in 
state  SD.  If,  instead,  we  wished  to  minimize  the  time  to  arrive  at 
state  SQ ,  then  the  model  will  require  some  non-trivial  modifications. 
In  the  next  section,  we  modify  the  perfect  information  model  to  include 
more  than  one  terminal  state  in  general. 
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4.  10  Two  Terminal  States 

We  consider  a  game  with  perfect  information  and  two  terminal 
states.  One  of  these  terminal  states,  state  1,  is  favorable  for  PI  and 
the  other  terminal  state,  state  n,  is  unfavorable  for  PI.  When  the 
pursuit  has  not  terminated,  it  is  in  exactly  one  of  the  states 
i  =  2,  . . .  ,  n  -  1.  Pi's  objective  is  to  minimize  the  expected  cost  of 
arriving  at  state  1.  For  example,  the  pursuit  may  terminate  when  PI 
catches  P2  or  when  P2  gets  away  from  PI.  Here  PI  is  interested  in 
minimizing  the  cost  of  catching  P2.  Notice  that  we  cannot  achieve  this 
objective  with  the  model  of  section  4.  8  because,  in  general,  this  model 
would  simply  minimize  the  cost  of  termination  in  either  state  1  or  n. 

We  develop  a  model  with  the  already  indicated  objective.  A 
device  first  introduced  by  Derman  [1]  is  used  to  transform  the  given 
absorbing  Markov  chain  to  an  equivalent  irreducible  chain.  The  cost 
of  absorption  in  state  1  can  then  be  expressed  in  terms  of  steady  state 
probabilities.  The  problem  of  finding  an  optimal  policy  becomes  a 
linear  fractional  programming  problem,  and  this  program  is  reduced 
to  a  linenr  program  by  the  method  of  Charnes  and  Cooper  [6].  When 
state  *i  is  deleted,  the  resulting  linear  program  is  precisely  the  dual  of 
the  '-rogram  given  in  section  4.  8.  Hence,  the  approach  taken  here  is, 


in  a  sense,  "dual"  to  the  approach  used  in  section  4.  8. 
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As  before,  we  define  the  following  parameters  for  states 
i  =  (2 . n  -  1).  Let  a^  be  the  cost  of  decision  k  when  the  pursuit 

k 

is  in  state  i  and  let  p..  be  the  probability  of  transition  to  state 
j  (j  =  1,  ...  ,  n),  given  state  i  and  decision  k  (k  =  1,  ...  M. ).  The  p„ 
must  satisfy 


pk.iO  2  pk.  =  1  i  =  2,  .  .  .  ,  n  -  1 
lJ  j=i  «  k*l, 


We  will  define  the  transition  probabilities  and  costs  for  states  1  and  n 


later. 


We  introduce  the  set  S  of  all  randomized  decision  policies 


D  =  {D.^}  where  is  the  probability  of  decision  k  given  state  i.  We 


require 


Z  D.,  =  1 
k=L  ‘k 


Then,  the  stationary  transition  probabilities,  p.^,  and  the  cost 
of  passing  through  state  i,  a^,  for  each  fixed  D«S,  is 

Mi  Mi 

(43)  p  =  2  D  p  i,  j  =  2,  . .  .  ,  n-1  ;  a  =  £  D  aK  i  =  2,  . .  .  ,  n-1 

u  k=i  ik  u  1  k=i  ik  1 


We  require  the  following  assumption: 

Assumption  A;  From  each  state  i  (i  =  2,  ...  ,  n  -  1)  and  for  all 


p..  is  the  probability  of  transitiqn  to  state  j  given  state  i  for 
some  fixed  D*S. 


1  vl'J  i  ,V  -O. 
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DCS,  state  1  is  reached  in  a  finite  number  of  steps  with  probability  one. 

We  expect  Assumption  A  to  hold  in  practical  situations.  For, 
policies  are  not  permitted  which  result  in  (i)  cycling  between  states  or 
(ii)  termination  in  state  n  only.  Policies  of  type  (i)  or  { ii)  yield  an 
infinite  cost  of  arriving  at  state  1.  Hence,  we  do  not  restrict  the  model 
by  ruling  out  these  possibilities. 

We  introduce  a  device  due  to  Derman  [1]  to  transform  the 
absorbing  Markov  chain  mto  an  equivalent  irreducible  chain. 
Accordingly,  for  all  D«S,  we  define  the  following  transition  proba- 


The  transition  probability  matrix  for  each  fixed  D«S  is  then 


As  may  be  noted,  when  the  pursuit  arrives  at  either  state  1  or  n,  it  is 
started  over  again  in  one  of  the  states  i  =  2,  ...  ,  n  -  1  with  uniform 
probability. 

From  Assumption  A  and  equation  (44),  it  is  clear  that  the  set 
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of  states  (1,  ...  ,  n)  is  irreducible  for  all  D*S.  We  will  depend  heavily 
on  this  fact  to  formulate  the  objective  function  and  the  model.  Let  p_ 
be  the  r-step  transition  probability  from  state  i  to  j  (i,  j  =  i ,  .  . .  ,  n). 
Consider  a  pursuit  which  lasts  exactly  mil  steps  and  for  fixed  D*S 
evolves  according  to  the  transition  probabilities  given  by  equations  (43) 
and  (44).  Let  Mp(m,  i)  be  the  expected  number  of  occurrences  of 
state  1  when  such  a  pursuit  starts  in  state  i,  and  let  Cp(m,  i)  be  the 
expected  cost  of  termination  in  state  1.  When  the  pursuit  starts  in 
state  i,  the  expected  cost  of  the  r^1  otep  is 


£  pi4  a  .  i  =  1,  . . .  ,  n. 
j=l  3  3 


Hence,  CD(m,  i)  is  given  by 


Cjj  (m,  i)  = 


£  £  rPij  aj 

r=l  j=l  J  J 

MD(m,  i) 


and  m  is  sufficiently  large  to  insure  that  M_.  (m,  i)  f  0.  Let 


CD"m  oa  CD{m>  ^  ll!'!',,,n5lSD<3, 

Theorem  6  establishes  that  the  above  limit  exists  and  is  independent  of 
the  starting  state  i.  Notice  that  Cp  is  the  expected  cost  of  termination 
in  state  1  regardless  of  the  starting  state  i-.  We  seek  to  find  a  D«S 


which  minimizes  Cp  over  all  D* 5, 

Theorem  6:  The  expected,  c cat  of  termination  in  state  1,  starting 


lim 

m  oo 


m  n 

Z  S 

r=l  j=i 


P-.  a. 

y  j 


m 


lim 
m  oo 


[ 


m 

Z 

r=l 


m 


] 


Since  the  set  of  states  (1,  ...  ,  n)  is  irreducible  for  each  D*S,  the 
Mean  Ergodic  theorem  holds,  i.  e.  , 


m  _ 

L  Pi 
Jim  r=l 

m  ®  m 


=  n  •  j  =  1 ,  . .  1  ,  n 

j  i 


This  limit  is  independent  of  i  (i  =  1,  . .  .  ,  n)  and  the  IIj  are  the  unique 

solution  to  (46).  (These  II.,  also  satisfy  II.  >  0,  j  =  l,  .  ..,n) 

J  s  J 


L  S  P..  a. 

lim  —  -J-=-- - -  =  z  n.  a.. 

m  to  m  J  J 


We  also  have  * 


lim  m  „  J_ 

m  «  MD(m,  i)  h* 

or  JL  is  the  mean  recurrence  time  of  state  1.  This  limit  is  well 
nl 

defined,  since  Assumption  A  guarantees  IIj  >  0. 

Putting  the  above  results  together,  we  get 


2  IL  a. 


ni  j=i  J  3 


This  completes  the  proof. 


We  want  to  find  a  D  *S  which  minimizes  Cj)  over  all  D*S. 

From  Theorem  6  and  equations  (42)  and  (43),  D  is  an  optimal  solution 


-See  Parzen  [  1] . 


149 


to  the  following  nonlinear  programming  problem. 


1  n  Mi  k 

Min  ||“  Z  L  II j  Djk  aj 
“1  j=l  k=l 


-  £  S’  n.  Dj.  p*.  s  0  j  =  1,  .  . .  ,  n 
J  i=l  k=l  1  lk  l} 


n 


(47) 


n 

s  n,  *  i 

j=i 

Mi 

S  D  =  1 
k=l  J 


j  =  1. 


n 


II.  £  0,  D  £  0 
J  Jk 


We  transform  problem  (47)  into  a  linear  fractional  programming 
problem  by  means  of  the  following  change  of  variables.  Let 


(48) 


—  O .  3  =  1,  ...  ,  n ;  k~i,  ...,  kL 


Mj 

From  (48)  and  £  -  1  ,  we  get 

k=l  ' 


Mi 

n.  =  S  j  =  ij  >n' 

J  W=I 


Problem  (47)  becomes 


For  convenience, 
for  states  1  and  n  and  p^ 


in  the  following  formulation,  we  have  k  =  1 
(1  =  l,n)  is  then  given  by  equation  (44). 


150 


(49) 


n 


Mi 


Min 


L  E  x.,  a. 

j=l  tel  *  J 


Mj 

L  * 
k=l 


Ik 


M, 


n 


J  li  i  K. 

Sub->ect  to:  L  *jk  -  L  2  *iv  Pii  =  0 


k=l 


i=l  k=l 


ik  ^ij 


j  =  1,  .  .  .  , n 


(49a) 


n 

2  X,  xjk 

j=l  k=l 


=  1 


xjk 1 0 

Clearly  the  transformation  (48)  is  one-to-one  between  optimal  solutions 
to  (47)  and  (49).  Hence,  we  may  solve  (47)  by  solving  (49). 

We  use  the  method  of  Charnes  and  Cooper  [6]  to  transform 
problem  (49)  to  an  equivalent  linear  program.  To  establish  this  equiva¬ 
lence,  we  observe  that  the  convex  set  of  feasible  solutions  to  (49)  is 

Mj 

bounded  and  non-empty.  Further,  II  =  2  x,,  >  0  for  all  feasible 


k=l 


lk 


solutions  to  (49).  Hence,  the  following  transformation  is  one-to-one 
between  problems  (49)  and  (51). 


(50) 


yjk  =  'xjk 


n  ^i 

s  £  yjk  =  t’ 

j=l  k=l  J 


t  2  xlk=  1 


Using  (50),  the  equivalent  linear  program  to  (49)  is" 


Min  Z  Z  y  a  , 
j=l  k=l  J 


i=l  k=i 


'  jk 


>'ik  =  1 


yJk  2  ° 


We  make  on?  further  reduction  of  problem  (5l).  Actually,  there 

are  no  decisions  to  foe  made  when  the  pursuit  is  in  tfce  terminal  states 

i  =  1  or  n.  Hence,  v/e  eliminate  the  variables  and  from 

problem  (51).  By  means  of  equation  (44)  anti  some  algebra,  problem 
««•» 

(51)  is  equivalent  £  to 
n-1 

Min  Z  Z  y-k 
j=2  k=l  J 


Mj  n-1  Mi 

(52)  L  y  -  Z  £ 

k=l  1=2  k=3 


S'  S  ?JpbPf,l 


H  W  ‘J  m 


j  -  •  »  n  •*■  1 


n~)  ^i  , 

Z  £  y.,  pK  =  1 

i=?  k~ 1  ^  “ 


1Tbe  constraint  (49a)  becomes  L  £  y-k 

j=i  k=i 

straint  is  redmtd&n’’.  in  (51)  we  have  omitted  it. 

^For  convenience,  we  have  taxen  ak  =  ak 

1  n 


=  t ,  since  this  con- 
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From  problem  {52),  we  obtain  a  final  result  concerning  the 
nature  of  the  optimal  decision  policy.  The  optimal  policy  is  charac 
terized  by 


{53} 


_*  r1  ^or  ^ =  k. 

=  I  0  for  k  ^  k. 

J 


j  =  2, 


n  -  1 


where  k.  is  some  decision  for  state  j . 

J 

Of  course,  (53)  says  that  a  "pure  policy"  is  optimal,  i.  e.  ,  for 
each  state  pick  some  alternative  with  probability  one.  Equation  (53) 
follows  from  the  following  observations.  From  the  constraints  of  (52) 


hence. 


{54} 


Mi  , 

k=i 


o 


j  =  2, 


Now  one  of  the  equality  constraints  in  (52)  is  redundant.  This  may  be 
verified  by  summing  c'rex  the  first  n  -  2  constraints.  Hence,  (52)  has, 
at  most,  u  -  2  linearily  inaependeni  constraints  (excepting  non-negativity 
conditions);  and,  hence,  a  basic  feasible  solution  has  nt  most  n-2 
positive  variables.  Further,  at  least  one  basic  feasible  solution  must 
fee  optimal.  By  (54)  and  the  fact  that  at  most  n  -  2  variables  can  be 
positive  in  an  optimal  solution,  we  have 
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y >0  for  k  =  k 

Jk  j  •  , 

•  j  =  2,  .  .  .  ,  n  -  1 

y*  =  0  for  k  ^  k 

jk  j 

where  k.  is  some  decision  for  state  j . 

J 

£ 

Now,  by  the  transformations  set  up  between  the  D  variables 

ik 

and  the  y  ^  variables,  we  conclude  that  our  assertion  (53)  is  correct. 

We  have  shown  how  to  formulate  a  model  for  situations  involving 

two  terminal  states  and  the  objective  of  minimizing  the  cost  to  arrive 

at  one  of  these  terminal  states.  In  section  4.9,  we  gave  an  example 

of  a  two  terminal  state  situation.  If  our  objective  for  that  problem  was 

to  minimize  tha  time  to  complete  a  successful  attack,  then  the  model 

!c 

presented  in  this  section  is  applicable.  All  immediate  payoffs,  a.-  , 
are  taken  equal  to  one  to  achieve  the  "time’1  objective.  As  may  be 
noted,  the  approach  tak>en  here  results  in  a  linear  program  which  is  the 
dual  of  the  linear  program  obtained  from  the  functional  equation  approach. 
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4.  1 1  A  Finite  Terminating  Stochastic  Game 

We  return  to  our  idealization  of  ASW  and  introduce  the  ad¬ 
ditional  rule:  the  pursuit  is  terminated  in  a  specified  finite  number 
of  moves  if  it  has  not  already  reached  a  terminal  state.  In  ASW, 
this  forced  termination  may  be  caused  by  one  of  a  number  of  factors, 
such  as  resource  limitations  or  submarine  endurance  time  when 
submerged.  This  means  we  have  a  finite  version  of  a  TSG.  If  termi¬ 
nates  in  m  moves  or  a  terminal  state,  whichever  occurs  first. 

The  following  notation  is  introduced  for  the  finite  TSG.  Consider 
a  collection  of  mutually  exclusive  and  collectively  exhaustive  states 
numbered  i  *  1,  ...  ,N.  Terminal  states  are  included  in  this  collection, 
and  the  finite  TSG  must  be  in  one  and  only  one  of  these  states  at  each 
move  t  =  1,  ...  ,  m.  When  the  game  is  in  state  i,  we  number  the  avail¬ 
able  alternatives  for  players  1  and  2  respectively,  k  -  i,  ...  ,  and 
h  =  1,  . . .  ,  N- .  If  the  finite  TSG  is  in  state  i  at  move  t  snd  players  1 
and  2  choose  alternatives  k  and  h  respectively,  then  the  payoff  from 
player  2  to  player  1  is 

kh  i  =  1 . N  k  =  1,  . .  .  , 

ai  (t'  ’  t  =  1,  ...  ,m  h=  1,  ...  ,Nj 

and  the  game  transits  to  state  j  with  probability 

f  ^  i»  j  “  1 1  ...  .  N  k  =  1,  . . .  , 

Pij  ‘  ’  t  =  1 ,  .  .  .  ,  m  h=  1,  ...  ,N. 
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Since  the  states  are  assumed  to  be  mutually  exclusive  and  collectively 
kh 

exhaustive,  'the  p„  (t)  must  satisfy 

L  p!'.*1  (t)  =  1  ;  also,  p^1  (t)  a  0  ,  all  i,  j,  k,  h,  t . 
j=l  1J  1J 

Final1  y,  we  assume  that  the  players  are  informed  of  both  the  state  and 
the  move  before  they  choose  their  strategies. 

One  will  note  that  the  above  structure  is  different  from  that  of 
the  infinite  TSG  in  the  following  respects.  In  the  finite  TSG: 

1 .  The  payoffs  and  transition  probabilities  may  depend  on  the 
move. 

2.  There  may  be  a  zero  probability  of  termination  in  one  move. 

3.  The  players  know  the  state  of  the  game  and  the  move  when 
they  choose  their  strategy  for  the  next  move. 

Thus,  if  the  game  is  finite,  more  flexibility  may  be  permitted  in  the 
model,  i.  e.  ,  items  1  and  2  above. 

Next,  we  show  how  the  value  and  optimal  strategies  of  a  finite 
TSG  may  be  computed.  As  will  be  noted,  the  methods  and  represen¬ 
tations  developed  here  are  closely  related  to  those  of  the  infinite  game. 

f  i i  i  ,  # 

Let  r  be  the  minimax  of  the  total  expected  pay- 

ments  received  by  player  1  from  the  remaining  m-t  moves  when  the 
game  is  in  state  i  at  move  t,  and  let  V (t)  =  (v^  (t),  .  .  .  ,  v^(t)).  Now, 
V{m)  is  the  minimax  of  the  total  expected  payments  with  zero  moves  to 


go;  accordingly,  V  (m)  =  0.  For  convenience,  we  introduce  the  M.  x  N. 
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til 

matrix  A^t(a}  whose  k-h  element  is 


kh 


N 


kh 


a  (t)  +  L  P  (t)  a. 

1  y  J 


k  =  1,  ...  , 
h  =  1 ,  . , .  ,  N. 


The  minimax  of  the  expected  payments  with  one  move  left,  V(m  -  1),  is 
clearly  given  by  the  following  set  of  equations: 


V.  (m  -  1)  =  Val  [A.  ( V(m)j]  =  Val  [A.  (0)],  i  =  1, 

l  lm  lm 


A  A 

Let  X.  (m)  and  Y.  (m)  be  optimal  strategies  for  player  3  1  and  2  re¬ 
spectively  in  the  game  A^  (V(m)).  Then  it  follows  that 

A  A  A  A 

X(m)  =  Xj  (m),  .  .  .  ,  X^{m)  and  Y  (m)  are  optimal  strategies  in  the 
th 

m  move  of  the  finite  TSG.  Since  the  payoffs  and  transition  proba¬ 
bilities  depend  only  on  the  move  and  the  state,  which  are  known  to  the 
players,  it  may  be  established  by  induction  that  the  following  relationship 
obtains: 


i  =  1,  ...  ,  N 

(54)  V  (t  -  1)  =  Val  [A  (V(t»j, 

1  t  =  1 ;  . . .  ,  m 


According  to  equation  (54),  V(0)  is  the  value  of  the  finite  TSG. 

A  A 

Let  X.  (t)  and  Y.  (t)  be  opcimal  strategies  for  players  1  and  2  re- 
1  1 

spectively  in  the  game  A.  ( V (t))  ~  * ’  *'  *  Then  it  is  clear, 

11  It  =  1 ,  .  . .  ,  m J 

from  equation  (54),  that  X(t)  =  (X  (t),  ...  ,X  (t))  and 

1  N 

A  A  A 

Y(t)  -  (Yj  (t),  .  .  .  ,  Yjj(t))  are  optimal  strategies  in  move  t  of  the  finite 
TSG.  Note  that  X(t)  and  Y  (t)  depend,  in  general,  on  the  move  of  the 
game  and  are,  therefore,  not  behavior  strategies.  In  general, 


behavior  strategies  are  not  optimal  in  a  finite  TSG. 

Returning  to  the  linear  program  L.  P.  (i,  V(t)),  we  see  that  it 
is  a  linear  programming  formulation  of  the  game  A.^  (V(t))  with  the 
payoffs  and  transition  probabilities  depending  on  t.  To  compute  the 
value  and  optimal  strategies  for  the  finite  TSG,  we  can  3tart  with 
V(m)  =  0  and  compute  X(m),  Y(m),  and  U  (m)  from  {L.  P.  (i,  V(m))}. 

A  A 

i  =  1 ,  ...  ,  N.  Now,  set  V (m  -  1 )  =  U  (m)  and  compute  X (m  -  1 ), 

A  A 

Y(m  -  1),  and  U (m  -  1),  and  so  on.  Thus,  the  value  and  optimal 
strategies  may  be  computed  recursively  by  linear  programming. 


CHAPTER  V  -  MULTIPLE  CONTACT  ALLOCATION  MODELS 


5.  1  Introduction 

We  develop  models  for  the  allocation  of  hunter -killer  forces  to 
multiple  contact  areas.  The  central  problem  is  to  determine  an  optimal 
division  of  effort,  between  several  contacts,  subject  to  typical  con¬ 
straints.  To  focus  attention  on  the  ideas,  we  consider  hunter-killer 
operations  which  consist  of  at  least  two  separate  contact  areas.  For  the 
first  model  types,  we  assume  that  effort  is  allocated  to  the  contact  areas 
only  once  during  the  planning  horizon.  We  then  relax  this  restriction 
and  formulate  a  dynamic  allocation  model.  Now,  before  introducing 
these  models,  we  discuss  the  predominate  features  of  the  tactical  sit¬ 
uations  which  will  be  considered. 

In  a  wide  variety  of  military  problems,  the  force  level  required 
to  accomplish  a  given  military  mission  is  uncertain  when  the  allocation 
of  forces  is  made.  One  of  the  primary  causes  of  this  uncertainity  is 
due  to  lack  of  information  on  enemy  forces  and  capabilities.  To  reflect 
this  uncertainity  in  the  model  we  assume  that  the  amount  of  effort 
which  is  required  to  accomplish  a  specified  military  mission  in  each 
area  is  a  random  variable  with  a  known  joint  cumulative  distribution 
function  (c.  d.  f.  ).  This  c.  d.  f.  may  be  rather  difficult  to  determine  in 
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practice!  Nevertheless,  we  assume  that  it  can  be  determined  and  sen¬ 
sitivity  studies  can  then  be  conducted  to  determine  the  effects  of  esti¬ 
mation  errors  and  data  variations.  After  the  models  are  formulated, 
we  can  also  ascertain  the  effect  of  treating  the  random  requirements 
as  deterministic  quantities. 

Next,  we  introduce  the  objective  functions  which  will  be  em¬ 
ployed.  These  objectives  are  oriented  toward  optimizing  a  measure 
of  overall  mission  success.  From  the  specification  of  the  random  re¬ 
quirements,  we  can  readily  relate  individual  mission  success  to  over¬ 
all  effectiveness.  For  we  have  assumed  that  the  military  mission  in 
each  tactical  area  can  be  accomplished  if  the  allocated  force  level 
exceeds  the  observed  random  level.  Hence,  our  objective  functions 
measure  the  "difference"  between  allocated  levels  and  the  random  re¬ 
quirements.  In  particular,  the  following  two  objectives  are  used: 

1.  Maximize  the  probability  that  all  allocated  force  levels 
simultaneously  exceed  their  random  requirements.  This  is  equivalent 
to  maximizing  the  probability  that  all  missions  are  simultaneously 
accomplished. 

2.  Minimize  the  total  expected  shortage  between  allocated  and 
required  levels. 

In  some  situations,  time  may  be  an  important  measure  of  ef¬ 
fectiveness,  For  these  cases,  the  following  objective  is  employed: 
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3.  Minimize  the  expected  distribution  time  to  achieve  a 
specified  probability  that  ail  requirements  are  met. 

Each  of  the  above  objectives  will  be  studied  for  the  "one-shot" 
allocation  models.  Objective  (3)  is  the  only  one  which  is  employed  for 
the  dynamic  model. 

We  have  introduced  the  requirements  and  objectives  which  will 
be  taken  for  multiple  cents.  ct  situations.  Now  we  discuss  a  measure 
of  available  effort.  We  measure  the  available  effort  in  some  meaning¬ 
ful  unit  such  as  a  ship,  surface  attack  unit,  or  one  flying  hour.  If 
several  types  of  effort  are  available,  then  all  effort  is  measured  in 
terms  of  a  single  "standard"  unil.  However,  the  models  could  be  ex¬ 
tended  to  include  different  types  of  effort.  Depending  on  the  measure¬ 
ment  adopted,  effort  may  be  treated  as  continuous  or  discrete.  For 
instance,  if  a  unit  of  effort  is  one  flying  hour,  then  effort  may  be 
treated  as  continuous.  On  the  other  hand,  a  unit  of  one  ship  will  usually 
require  a  discrete  treatment.  Both  discrete  and  continuous  measure¬ 
ments  will  be  studied  for  most  of  our  models. 
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mission  in  area  j  leads  to  the  interpretation  that  we  are  maximizing  the 
probability  that  all  missions  are  simultaneously  accomplished.  The 
restrictions  in  model  (1)  are  on  the  total  amount  of  effort  available  and 
on  the  non-negativity  of  each  individual  allocation  Xj  . 

The  burden  of  optimization  in  (1)  is  placed  on  the  objective 
function.  Later,  we  consider  models  with  more  complicated  con¬ 
straints  and  a  simplier  objective  function.  This  model  embodies  the 
essentials  of  H.  A.  Simon's  [1]  satisficing  approach.  For  we  maximize 
the  probability  that  a  specified  goal  is  reached.  In  addition,  this  ob¬ 
jective  is  similar  to  the  one  used  by  Charnes  and  Cooper  [4]  in  their 
so-called  "P -model".  Next,  we  discuss  methods  for  solving  problem 
(1). 

We  apply  the  Kuhn  -  Tucker  conditions  of  convex  programming  to 
(1).  To  employ  these  conditions,  we  require  that  F  is  continuously 
differentiable.  Without  loss  of  generality,  we  may  replace  the  inequal¬ 
ity  in  (1.  1)  by  equality,  since  F  is  a  c.  d.  f.  and  therefore  it  is  a 
monotone  non-decreasing  function.  The  Kuhn  -  Tucker  necessary  con¬ 
ditions  are  the  following:^  if  X  =  (xj,  X2>  . .  .  ,  xn)  is  an  optimal  solution 
to  (1),  then  there  exists  a  scalar  p  such  that  X  and  p  satisfy 


*  Of  course, 
straints  of  (1). 


"constraint  qualification"  is  satisfied  by  the  con- 
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(2)  T“  -  H  5  0  j  =  1,  .  ,  .  ,  n 

3xj 

(3)  xj  -  p.  =0  j  =  i,  ...  ,  n 

(4)  £  x.  =  a,  x.  ;»  0  j  =  1,  ...  .n 
j=l  J  J 

If  F  is  a  concave  function,  then  the  above  conditions  are  also  sufficient 
for  X  to  be  an  optimal  solution  to  (1).  These  equations  are  rather  diffi¬ 
cult  to  solve  in  general  because  (2)  and  (3)  are  usually  non-linoar. 
Nevertheless,  we  will  apply  these  conditions  to  a  special  case  of  (1), 
but  first  we  examine  model  (1)  when  dp  ...  ,  are  independently 
distributed. 

Numerous  tactical  problems  have  independent  dp  .  . . ,  dn. 

We  would  expect  independence  when  an  allocation  to  one  area  does  not 
have  an  appreciable  spillover  effect  on  other  areas.  Indeed,  contact 
areas  are  often  widely  separated  and  no  interaction  occurs  between 
areas.  Furthermore,  independence  would  probably  be  required  in  order 
to  empirically  determine  F.  With  the  assumption  that  dp  ...  ,  dj^  are 
independently  distributed,  we  obtain 

(5)  F  (Xp  . . .  ,  xn)  =  Fj  (x^  F2(x2)  .  . .  Fn(xn) 

where  F.(x.)  =  Pr  (d.  Sx.) 

3  3  3  y 


Now,  we  may  maximize  the  logarithum  of  the  function  in  (5) , 
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since  the  log  is  a  monotone  transformation  and  (5)  is  non-negative.  With 
this  transformation,  problem  (1)  becomes 

Max  £  log  F:(X:) 
j  =  l 


,  n 

(6)  Subject  to:  £  x.  =  a 

j=l  J 

xjS° 

Perhaps  the  most  general  method  which  is  available  to  solve  (6) 
is  dynamic  programming.  It  is  especially  useful  when  the  xj  are  re¬ 
quired  to  be  non-negative  integers.'  Since  the  application  of  dynamic 
programming  to  allocation  problems  has  been  extensively  studied  {see 
Bellman  and  Dreyfus  [  1]  ),  we  do  not  dwell  cn  this  method  here.  Instead, 
we  turn  to  an  important  special  case. 

The  special  case  is  studied  where  the  random  variables  d^  are 
independent  and  uniformly  distributed  between  aj  and  bj  .  A  uniform 
distribution  implies  that  the  actual  requirement  occurs  at  random. 
Roughly  speaking,  no  particular  requirement  is  preferred  o\er  any 
other  requirement. 


With  the  uniform  distribution  and  independence  assumption,  we 


have 
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F.(xJ  = 
J  } 


x.-  -  a; 


J  J 


x.  <  a. 
J  J 

.a  it  x  <  b 

j  J  j 


x  a  b 

j  j 


ii 

Notice  that  when  £  a:  a  a  then  the  maximum  in  (6)  is  negative  infinity 

•j  =  l 

and  when  £  b j  £  a  then  all  solutions  with  each  xj  a  bj  are  optimal 

j=l 

Hence,  we  restrict  our  attention  to  the'  following  non-trivial  case: 


n  n 

L  a,  <a  <  £  b 

j=l  J=1  J 


For  convenience,  we  introduce  the  transformation 


yj  =  Xj  •  aj 


and  we  let  c  =  a  -  £  a-  >0,  C;  =  b;  -  a-  >  0 
j=i  J  J  J  J 

then,  when  (8)  is  satisfied,  problem  (6)  with  equation  (7)  is  equivalent 


Max  £  log  y- 


n 

£  y,  =  c 

j=l 


0  £  yj  £  c- 


166 


Now  the  optimal  solution  to  (10)  may  be  readily  obtained  by  the  follow¬ 
ing  method.  For  simplicity,  suppose  c  £  c  £  .  .  .  £  c  .  Then  the 

l  l  n 

following  allocation  is  optimal 


y  =  min  (c.,-  ) 
1  in 


(ID 


f  c  -  =  CJ  s 

r.  =  ^min(ci.  j=l  ) 

1  1  a  -  i  +  1  ✓ 


l 


Yi 


i  -  1 


ff  yi  - 1  =  ci . 


c*-i 


?.  —  2 ,  ...  ,  n 


The  optima-lity  of  (11)  can  be  routinely  verified  by  showing  that  this 
solution  satisfies  the  Kuhn  -  Tucker  sufficient  conditions.  The  pro¬ 
cedure  given  by  (11)  allocates  an  equal  amount  to  each  activity  until 

y^  =  c^  .  Then  equal  amounts  are  allocated  to  the  remaining  activities 

n  n 

until  v  =  c  or  2  y=c.  This  process  is  continued  until  £  y  =  c. 

2  2  i=l  1  i=l  1 

To  illustrate,  consider  the  following  simple  example: 

C1  =  2,  c2  =  3,  c3  =  c4  =  9,  c  =  12 

The  optimal  solution  from  (11)  is 

y  =  2,  y  =  3,  y  =  y  =  7/2 
1  2  3.4 

We  examine  the  tactical  consequences  of  the  solution  given  by 
(11).  This  optimal  solution  requires  a  maximal  allocation  of  Cj  to 
certain  areas,  namely  those  areas  where  a  probability  of  one  can  be 
achieved  with  the  least  amount  of  effort.  All  other  areas  which  have 


I 

& 

i 
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not  achieved  a  maximal  allocation  receive  the  same  amount  of  effort 
which  is  greater  than  the  largest  Cj  for  those  areas  which  have 
achieved  a  probability  of  one. 

This  policy  is  appealing  in  some  aspects  but  it  has  drawbacks 
introduced  by  the  lin.arity  of  the  c.  d.  f.  's,  namely  maximal  allocation 
to  some  areas.  This  phenomena  would  disappear  with  c.  d.  f.  's  of  the 
non-linear  type  such  as  those  of  the  exponential  family.  Rather  than 
pursue  these  points  further  here,  we  turn  to  the  second  objective  of 
minimizing  the  expected  shortages. 


\\ 

\ 
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5.  3  Expected  Shortage  Model 

We  formulate  the  expected  shortage  model  and  show  that  the  case 
of  independent  uniform  distributions  is  reduced  to  a  quadratic  program¬ 
ming  problem.  The  tactical  motivation  for  this  model  is  the  same  as 
for  the  one  of  the  last  section.  Hence,  we  proceed  directly  with  the 
formulation  of  the  model. 

Our  objective  will  be  to  minimize  the  expected  excess  of  demand 
over  supply  (allocation).  Therefore,  we  introduce  the  following 
shortage  function. 


(12! 


wz» 


x.  >  Z 

J 

x.  s  Z 
3 


j  =  1,  . . .  ,n 


where  as  before  x  is  the  amount  allocated  to  area  j .  Then  0  (x  ,  Z) 
J  J  J 

is  the  shortage  in  area  j ,  if  Z  is  the  actual  demand.  Let  Fj  be  the 


marginal  c.  d.  f.  of  d  and  let  E  (x  )  denote  the  expected  value  of 

j  j  j 

0.  (x. ,  Z).  Now  we  assume  that  each  F.  is  sufficiently  well-behaved  so 
J  J  J 

that  Stilges  integrotion  by  parts  may  be  performed.  Then, 


0. (x. , 
J  J 


Z)  dF.(Z)  =  f  (Z  -  x  )  dF.(Z) 
J  J  J 

j 


«» 


-  x.) 
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x  xj 

E.  (x.)  =  u-  -  x.  ~["(Z  -  x.)  F.  (Z)  I  j  -  f  F.  { Z)  d  Z 1 
J  J  J  L'  J  J  1  Z  =  0  J0  J  J 

Xj 

=  p.  -  x.  -  x  F  (0)  +  f  F  ( Z)  d  Z 
J  J  J  J  o  J 


where  ^ .  is  the  mean  of  dj  . 


We  have  F.(0)  =  0,  since  d.  is  a  non-negative  random  variable. 
3  J 


Hence, 


rxj 

E  (x  )  =  p .  -  X  +  J  F.(Z)dZ 
J  J  J  J  Jo  3 


According  to  our  stated  objective  2  from  section  5.  1  and  the 
previously  indicated  constraints,  we  formulate  the  following 


optimization  problem: 


(14a) 


Min  £  X.  E.(x.) 
.  ,  3  3  3 
J=1 


s  xjSi 


x.  2  0 
3 


11 

where  the  X..  are  specified  weighting  factors  with  X.  £  0,  2  X.  =  1 . 

J  J  j=1  J 

These  weighting  factors  may  be  used  to  -reflect  the  relative  importance 

of  shortages  between  various  areas. 

Problems  similar  to  (14)  have  been  investigated  by  other  authors. 

For  example,  Charnes,  Cooper,  and  Thompson  [2]  investigate  a  general 

class  of  problems  in  '-’constrained  generalized  medians"  *  and  (14) 


See  this  reference  for  an  extensive  list  of  references. 


170 


belongs  to  this  class. 

Now  Ej  (x.)  is  a  convex-decreasing  function  of  and,  therefore, 
the  inequality  (14a)  may  be  replaced  by  an  equality.  There  are  two 
general  solution  technqiues  which  may  be  applied  to  (14).  The  first 
technique  is  dynamic  programming.  Of  course,  thi3  method  is  es¬ 
pecially  useful  when  the  x^  are  required  to  be  non-negative  integers. 
Dynamic  programming  does  not  require  many  special  properties  of  the 
function  Fj  and  it  is  an  efficient  computational  technique.  (See 
Bellman  and  Dreyfus  [  1],  )  The  other  technique  which  may  be  used  to 
solve  (14)  is  the  Charnes  and  Lemke  [1]  minimization  technique  for 
non -linear  separable  convex  functions.  This  technique  is  especially 
useful  when  more  constraints  are  adjoined  to  (14). 

For  purposes  of  comparison  with  the  previous  section,  we 
discuss  the  special  case  when  the  dj  are  independent  and  uniformly 
distributed.  Accordingly,  we  assume  that  F.(x.)  is  given  by  equation  (7). 

Then  E  (x  )  will  take  the  following  form 

j  J 


E.  (x 
J  J 


ru .  -  x. ,  0  £  x.  <  a. 

j  J  J  J 

t „  \2 


x.  + 


(x;  .  af) 


J  J  2  (b:  -  a-) 

0, 


a.  s  x.  <  b. 
J  J  J 


b.  s:  x- 
J  J 


1 


where  u  =  a  +  _(b  -  a), 
j  j  2  J  J 


As  may  be  noted,  E^  (Xj  )  is  either  a  linear  or  quadratic  function  of 
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x.  (see  Figure  5.  1  below). 

J 


Figure  5. 1 

To  reduce  our  model  to  a  quadratic  programming  problem,  we 

introduce  the  variables  v  and  w  with 

J  ) 

Wwi  i  =  1 . n 

(16)  0<y  a. 

0  i  w.  S  c.  where  c.  =  b.  -  a.  a  0  j  =  1,  .  .  .  ,  n  . 

J  J  J  J  J 

However,  we  must  require  w.  =  0  when  v.  <  a. .  This  is  accomplished 

J  J  J 

by  the  non-linear  conditions 

w.  (v.  -  a  )  =  0  j  =  1,  .  . .  ,  n  . 

J  J  J 

These  restrictions  can  be  maintained  by  restricted  basis  entry.  With 
the  above  change  of  variables,  the  expectation  of  (15)  becomes 

E  =  u  -  (v  +  w  )  -i-  _L.  w  ^ 

J  j  j  J  2c.  J 


(17) 
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w.  (v  -  a  )  =  0 
J  J  J 

0  £  v  £  a 
J  J 

0  £  w  £  c 
j  J 

Any  of  the  standard  quadratic  programming  methods  can  be  used  to 

solve  (18).  Of  course,  bounded  variable  techniques  can  also  be 

employed  to  substantially  reduce  the  size  of  the  constraint  set  and 

thereby  improve  computational  efficiency. 

For  the  uniform  case,  we  compare  the  expected  shortage  model 

to  the  probability  model.  We  take  X  =  —  j  =  1,  .  . .  ,n  in  (18), 

J  n 

since  the  probability  model  maximizes  the  probability  that  all  missions 

are  simultaneously  accomplished  and  therefore  the  missions  are  equally 

weighted.  Notice  that  we  will  have  a  nontrivial  optimal  solution  for  (18) 
n 

when  £  a-  a  a .  This  was  not  true  for  the  probability  model.  To  obtain 

j=l  J 


^he  constant  £  \jjaj  has  been  dropped  from  the  objective 
function  and  the  optimization  has  been  changed  to  maximization  by 
multiplying  the  objective  function  by  -  1  .  We  have  also  excluded  the 
trivial  case  £  bj  £  a  and  this  permits  us  to  write  equality  in  the  second 
constraint. 
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5.  4  Chance- Constrained  Distribution  Model 

In  this  section  we  introduce  a  more  complicated  model  of  a 
multiple  contact  problem.  This  model  also  utilizes  the  concept  of 
random  demands  which  are  required  to  accomplish  a  specified  mission. 
However,  here  we  explicitly  recognize  that  the  units  of  effort  may 
come  from  different  origins.  The  problem  is  to  minimize  the  ex¬ 
pected  distribution  time  to  accomplish  each  mission  with  at  least  a 
specified  probability.  We  develop  a  chance-constrained  distribution 
model  of  this  problem. 

To  formulate  the  model,  suppose  that  a.  units  are  available  at 
some  location  (origin)  i  (i  =  1,  „  .  .  ,  m),  where  a.  is  a  given  non¬ 
negative  integer.  Further,  assume  that  the  number  of  units  required 
to  accomplish  the  specified  mission  at  some  location  (destination)  j 

{ j  =  1,  ...  ,  n)  is  a  non-negative  discrete  random  variable  d  with  the 

J 

known  marginal  c.  d.  f.  ,  F. .  Of  course,  some  of  the  origins  and 
destinations  may  coincide.  Also,  F.  may  be  a  degenerate  distribution 

J 

for  some  destinations,  giving  rise  to  a  deterministic  requirement. 

The  above  assumptions  lead  directly  to  distribution  type  con¬ 
straints,  One  set  of  constraints  is  as  follows;  the  amount  sent  from 
any  origin  cannot  exceed  the  amount  available.  Then,  letting  x..  be  the 
amount  sent  from  origin  i  to  destination  j,  we  get 
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n 

(20)  L  x..  £  a  i  =  1,  ...  ,  n 

j=l  J 

The  second  set  of  constraints  is  written  to  reflect  the  random  nature 
of  the  requirements.  The  value  of  the  random  •variable  dj  is  observed 
after  the  allocations  are  made.  In  the  face  of  thi  uncertainty,  we 
employ  the  ingenious  notion  of  chance-constraints  due  to  Charnes  and 
Cooper  [2].  These  constraints  are* 

(21)  Pr  {  2  x„  a  d.  ja  c.  j  =  1,  .  .  .  ,  n 

is  l  1  1 

where  the  Cj  are  specified  constants.  The  double  inequality  in  (21)  reads 

/  m  \ 

as  follows:  the  number  of  units  sent  to  destination  j  l  J  xiiy  must 

1=1  J 

exceed  the  actual  requirement  at  least  100  c.  %  of  the  time.  Hence, 

J 

these  constraints  guarantee  a  stipulated  level  of  protection  against 

shortages  at  each  destination.  In  addition,  we  will  place  non-negativity 

restrictions  on  the  x. . . 

ij 

As  mentioned  previously,  our  objective  will  be  to  minimize  the 
expected  distribution  time.  Accordingly,  we  let  t- .  be  the  time  for  one 

J 

unit  to  travel  from  origin  i  to  destination  j,  where  the  are  random 
variables  with  known  means.  Then,  our  objective  is 

(22)  Min  E  {  2  2  t..  x..  } 

L  i  j  lJ 

*"Pr"  denotes  probability  in  the  following  equation. 
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A 

where  "E"  is  the  expectation  operator.  We  let  t-  be  the  expected  value 
of  t-j  .  The  i,  bringing  together  the  above  objective  and  the  already 
indicated  constraints,  we  obtain  the  following  optimizacion  problem.  * 


(23) 


Min  £  £  t..  x. . 
i  j  3  3 


£  Xy  d  aj 

J 


(23a) 


a  c 

J 


x. .  a  0 
ij 

Problem  (23)  is  distribution  model  with  the  chance -constraints  (23a). 

Fortunately,  we  can  solve  this  model  by  obtaining  an  equivalent  distri 

2 

bution  model  with  no  random  elements,  a  deterministic  equivalent. 

To  proceed,  we  rewrite  equation  (23a)  in  terms  of  the  known 


c.  d.  f.  '8,  i.  e. 

(24)  W*0) 


where 


x.  =  £  x.. 
J  ;  lJ 


Recall  that  dj  is  a  discrete  random  variable  and  let 


p.,  =  Pr  [d.  *  k}  =  F.  (k) 
jk  J  J 


k  =  0,  1,  2,  .  .  . 
j  =  1,  .  .  .  , n 


*To  reduce  the  objective  function,  we  use  the  fact  that  the  ex¬ 
pectation  of  a  sum  of  random  variables  is  the  sum  of  the  expectations. 
^The  approach  used  is  due  to  Charnes  and  Cooper  [41. 
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To  obtain  the  deterministic  equivalent,  w*  define  the  function  F. 
which  is  a  pseudo  inverse  of  F^  . 


^(y) 


^  0  y^O 

)  k  P.  .  ,  <  y  5  P.,  k  s  0,  1,  2,  . . . 

t  J>  k-l  Jk 


For  each  j,  the  relationship  between  F.  and  F.  is  depicted  by 

J  J 

Figure  S.  2  below. 


F(x) 


Fj(y) 


0  12  3 


iO  Pjl  Pj2 


1  y 


Figure  5.  2 


The  following  lemma  provides  the  wanted  reduction  of  equation 


Lemma  1:  F  (x. )  £  c  if  and  only  if  x  s:  F  (c  ) . 
- ~~  J  J  j  j  j  j 


A  o  eii» 


>  «  +  L*  4-  Ml  MM _  M  -  ..  ./I  .  •  /  »  _ 

kv  virnw  ivi  iajigu  jf  A  oauaiico  JP  ^ X.  f  ^  C 

3  j  j  j 


Then,  since  F.  is  monotone  non-decreasing,  x.  a  k  where  k  is  the 
J  J 

unique  integer  which  satisfies 
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Pj.k-l<cj*Pjk 

Now,  by  definition  of  Fj  ,  Fj  (cj)  =  k;  hence, 

x.  s  F.  (c.) 

J  J  J 

To  prove  the  opposite  implication,  assume  for  fixed  j  that  Xj  satisfies 

x.  i  F.(c.).  Then,  x,  2  k  where  k  is  the  unique  integer  which  satisfies 
J  J  J  3 

pj,k~l<Cj*Pjk 

But,  Fi  is  monotone  non-decreasing;  thus, 

J 

F  (x  )  2  F  (k) 

j  j  j 

Furthermore, 


F.  (k)  =  p.,  a  c. 

J  jk  J 

Hence,  F.  (x. )  a  c.  and  the  lemma  is  proved. 

J  J  J 

From  Lemma  1,  problem  {23)  and  the  following  problem  are 

# 

equivalent 

Min  Z  Z  ^  x.. 

*  J 


(26) 


L  x..  £  a. 

j  IJ  1 


(2oa) 


S  x..  s  F.  (c.) 

3  3 

x 

X..  2-  0 
IJ 


Since  problem  (26)  is  a  distribution  model,  the  following  well-known 
properties'  of  (26)  or  equivalently  (23)  are  immediately  available: 


1 


See  Charnes  and  Cooper  [5]. 
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i.  Problem  (26)  has  an  optimal  solution  if  and  only  if 


L  a.  a  2  F.  (c.) 

1  1  j  J  J 

2.  Since  the  F. (c. )  j  =  1,  . .  .  ,  n  are  integers  and  the  a. 

i1  J 

i  =  1,  ...  ,  r?.i  are  assumed  to  be  integers,  (26)  has  an  optimal  integer 
extreme  point  solution, 

3.  The  inequalities  in  (26a)  may  be  replaced  by  equalities  with¬ 
out  charging  the  values  or  the  existence  of  optimal  solutions  (since  all 

*u  2  0) • 

Property  2  above  is  especially  useful  because  no  special  integer 

techniques  are  required  to  obtain  an  integer  solution. 

It  is  interesting  to  note  that  (26)  is  infeasible  if  there  are  not 

enough  units  available  to  obtain  the  stipulated  confidence  levels  in  (23). 

Of  course,  feasibility  may  be  secured  by  reducing  the  value  of  some  Cj 

or  increasing  the  amounts  available.  One  of  the  important  features  of 

(23)  is  that  its  deterministic  equivalent  (26)  has  a  dual  and,  therefore, 

dual  interpretations  can  be  obtained.  The  dual  evaluators  indicate  the 

change  in  the  objective  function  per  unit  change  in  a.  or  in  F.  (c.)  .  This 

^  J  J 

leads  to  an  immediate  evaluation  of  the  effect  of  a  change  in  c5  on  the 

optimal  Tolution.  Of  course,  this  effect  is  discontinuous  since  F.  is  a 

J 

step  function  (see  Figure  5.  2),  In  addition  to  dual  evaluation,  senoitivity 
and  parametric  studies  can  also  be  implemented. 


We  conclude  discussion  of  this  model  with  an  example.  Suppose 
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that  a  known  number  of  ships  are  available  at  each  of  four  origins  and 
that  specified  missions  are  to  be  accomplished  at  each  of  six  desti¬ 
nations  during  the  planning  horizon.  This  initial  tactical  configuration 
is  shown  in  Figure  5.  3, 


D, 

] 

°1 

°2 

°2 

D4 

O3D3 

o4d5 

D6 

0.  denotes  origin  i  and  D.  denotes  destination  j 

J 

Figure  5.  3 


Some  of  the  above  origins  and  destinations  coincide  since  units 
are  both  available  and  required  at  these  points.  The  tactical  infor¬ 
mation  pertaining  to  the  destinations  is  given  in  the  following  Table  5.  1. 
We  also  have  computed  F  (c  ). 

j  j 
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F  (k) 

J 

C . 

F  (c-1 

J 

k=0 

k=l 

k=2 

j 

r  y 

D1 

D2 

0.  2 

0.  6 

1.  0 

.  7 

2 

0.  5 

0.  8 

1.  Q 

.9 

2 

D3 

0.4 

0.9 

1.  0 

.  8 

l 

D4 

0 

0 

1.  0 

1.  0 

2 

D5 

0.  7 

0.9 

1.  0 

.  6 

0 

D6 

0.  7 

0.9 

1.0 

.9 

1 

Demand  D  stnbutions 
Table  5.  1 

The  mean  travel  times  from  each  origin  to  each  destination 
and  the  amount  available  at  each  origin  are  given  in  Table  5.  2  below. 


D2  D3  D4  D5  D6  Avail 


8  10 
8 


Travel  Times  and  Amounts  Available 


Table  5.  2 
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As  may  be  noted  from  Table  5.  1  ,  some  destinations  are  more  "critical" 
than  others.  For,  certain  requirements  must  be  satisfied  with  higher 
probability.  In  this  way,  the  constants,  c.,  reflect  the  relative  im- 
portance  of  the  missions.  Also  from  Table  5.  1  ,  a  demand  already 
exists  at  so  that  2  units  must  be  sent  there  with  probability  1. 

Finally,  no  units  are  required  at  Dg  to  attain  the  stipulated  level  of 
protection  against  shortages  and  is  now  deleted  from  the  problem. 

The  data  from  Tables  5.  1  and  5.  2  is  assembled  in  the  following 
distribution  tableau. 


D1 

D2 

_ 

D3  • 

D4 

D& 

Dummy 

Avail 

t 

0, 

2 

© 

© 

4 

6 

10 

i 

0 

o 

°2 

2 

O 

4 

4 

3 

© 

8 

0 

2 

°3 

4 

3 

0 

© 

3 

© 

4 

0 

e 

2 

°4 

8 

5 

3 

5 

3 

© 

0 

o 

2 

Req'd 

2 

2 

1 

2 

1 

1 

\ 

i 

Optimal  expected  time  =  19. 


Optimal  Tableau 


Table  5.  3 
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The  optimal  solution  is  circled  in  the  preceding  tableau.  An  alternate 
optimum  i3  also  available.  Both  optimal  solutions  are  shown  below. 


Optimal  Distribution  Schedules 
Figure  5.  4 

As  we  have  mentioned,  there  are  a  number  of  sensitivity  studies 

which  can  be  conducted.  Most  of  these  are  well  understood,  and  we  do 

not  dwell  cn  them  here.  However,  we  have  computed  the  variation  in 

c.  which  is  allowed  before  the  requirements  F.  (c.)  change.  These 
J  J  J 

computations  are  made  from  Table  5.  1  and  they  are: 


,6<Cj  s  1.0  ,  ,8<  c^s  1.0  ,  ,4<c3^.9 

0  <  cg  £  .7,  .  7  <  c^  £  .9 

Because  the  requirements  are  discrete  random  variables,  consider¬ 
able  variation  of  the  risk  coefficients  c.  is  allowed  without  changing  the 

J 

requirements  in  the  deterministic  equivalent.  Nevertheless,  the  optimal 
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solution  is  sensitive  to  changes  in  c.  if  is  near  the  extremes  of  the 
of  the  allowable  range. 

To  sum  up,  the  model  presented  in  this  section  determines  an 
optimal  (minimum  expected  time)  distribution  of  units  from  their  origi¬ 
nal  locations  to  contact  areas.  The  number  of  units  which  will  be  re¬ 
quired  to  accomplish  the  mission  in  each  area  is  not  known  in  advance, 
but  these  requirements  are  distributed  according  to  a  known  c.  d.  f. 

We  required  that  the  number  of  units  sens  to  each  contact  area  must 

* 

satisfy  the  actual  requirement  with  at  least  a  specified  probability, 
i,  e. ,  the  mission  mu3t  be  accomplished  with  at  least  this  specified 
probability.  The  model  was  then  reduced  to  an  equivalent  distribution 
model  with  no  random  elements  (a  deterministic  equivalent).  Finally, 
an  example  was  given  to  illustrate  these  ideas. 

O 
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5.  5  A  Dynamic  Distribution  Mod<?l 

We  extend  the  model  of  the  previous  section  to  two  periods. 
This  extension  is  dynamic  because  the  two  periods  are  coupled  to¬ 
gether  by  using  the  same  units  in  each  period.  The  problem  is  to 
minimize  the  expected  distribution  time,  subject  to  constraints  on 
the  amounts  available  and  chance  constraints  on  the  requirements. 
This  model  is  reduced  to  a  deterministic  equivalent  by  the  use  of  a 
zero-order  decision  rule  for  each  period.  A  method  is  also  proposed 
which  allows  the  decision  variables  in  the  second  period  to  be  de¬ 
pendent  on  the  actual  requirements  observed  in  the  first  period.  The 
method  only  utilizes  zero-order  decision  rules. 

To  formulate  the  model,  we  assume  that  allocations  are  made 


at  the  beginning  of  periods  1  and  2.  We  number  the  locations  where 
units  are  available  and  /  or  required  j  =  1,  ...  ,  n.  Because  of  this 


numbering  system,  some  of  these  locations  may  have  either  nothing 
available  or  nothing  required.  Let  d^  be  the  number  of  units  required 
to  accomplish  the  specified  mission  at  location  j  in  period  k  =  1,  2 . 


Now  d*  is  observed  after  the  first  and  before  the  second  allocations  are 

j 

2 

made,  while  d.  is  observed  after  the  second  allocations  are  made. 


The  d.  are  discrete  random  variables  with  a  known  joint  c.  d.  f.  Let 
J 


x  be  the  number  of  units  sent  from  location  i  to  j  (i,  j  =  1,  . .  .  ,  n)  m 

ij 


period  k  =  1,  2 ,  and  let  a^  be  the  number  of  units  available  at  location  i 
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initially. 

We  introduce  the  following  distribution  constraints: 

(27) 

E  x. .  a. 
.13  1 

J  J 

(28) 

Pr 

/e  x. .  a  df 

j  y  j  • 

}  a  c] 

(29) 

2 

j 

2 
x. . 

y 

-  f  4 + 

2  x. .  £  a. 

j  1J  1 

(30) 

Pr- 

[s  x2.  s  d2] 
!•  y  j  J 

IV 

0 

*-•  tsJ 

(31) 

l  2 

x. .  ,  x. ,  a  0 

y  y 

where  of 

and  c2  are  specified  constants  with  0  s  c^  S  1  . 

Constraint  (29)  is  the  only  type  not  encountered  in  the  last 
section.  It  requires  that  the  amount  sent  from  location  i  in  period  two 


cannot  exceed  the  amount  available  there 


( *  ■&; 

(  a  +  £  x..  -  E  x. .  This  constraint  couples  the  distribution  models 
K  1  3  3  j  i)S 


of  each  period. 

k 

Let  t. .  be  the  time  taken  by  one  unit  to  go  from  location  i  to  j 

Jf 

xn  period  k  (t.^  may  be  a  random  variable).  Our  objective  is  to  mini¬ 
mize  the  total  expected  distribution  time,  i.  e.  , 


(32) 


w.  v,  ll  1  .  *2  2 

Min  E  t. .  x. .  +  t. .  x. . 

i  i  y  y  y  y 
*i  j 


*  l  a  2  12 

where  t..  and  t. .  denote  the  me?n3  of  t:.  and  t..  respectively. 

J  V  “J 

1  2 

We  assume  that  bot..  x.^  and  are  determined  by  zero-order 
decision  rules;  then,  by  Lemma  1  of  the  last  section,  equations  (28) 


M. 


of  (34). 
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such  as  the  mixing  routine  of  Charnes  and  Cooper  [5]  are  available  to 

exploit  this  special  structure  and  thereby  reduce  computational  effort. 

However,  for  multiple  contact  situations,  "n"  would  be  on  the  o=der 

of  10.  In  this  case,  (34)  is  a  linear  program  with  200  variables  and 

40  constraints.  This  is  not  a  large  problem  for  modern  linear  program 

codes.  Therefore,  we  do  not  dwell  on  special  methods  of  computation?. 

We  propose  a  method  for  implementing  (34).  An  optimal  so- 

1  2 

lution  to  (34)  yields  optimal  x  and  x  .  Instead  of  using  both  of 

ij  ij 

these  optimal  distribution  plans,  one  can  employ  the  following  pro¬ 
cedure.  Use  the  optimal  from  (34).  The  x?.  which  are  actually 
employed  are  obtained  by  solving  the  one  period  distribution  model  of 
section  5.  4 ,  with  Fj  being  the  conditional  distribution  of  dj  given  the 

actual  values  of  d*  which  have  been  observed.  To  reiterate,  the 
J 

1  2 

optimal  x  from  (34)  are  used.  The  x  from  (34)  are  not  used.  Rather, 
ij  ij  - 

we  determine  conditionally  optimal  x^  given  the  actual  values  of  d* 

ij  j 

which  are  observed. 

This  dynamic  model  can  also  be  formulated  with  a  linear  de¬ 
cision  rule.  Only  a  verbal  description  of  the  procedure  is  given  here, 

since  no  new  results  are  obtained  when  such  a  rule  is  employed.  We 
o 

would  write  xft  as  an  unknown  linear  combination  of  the  random 
1  2 

variables  cL  .  These  expressions  for  x..  are  then  substituted  in  (34)  . 
and  a  deterministic  equivalent  for  the  case  of  normally  distributed  d* 


■'-if 
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and  d^  can  be  obtained  by  the  method  of  Charnes  and  Cooper  [4].  The 

optimal  solution  to  the  resulting  model  yields  an  optimal  linear  decision 

rule  for  the  xf\ .  The  x..  which  are  employed  will,  therefore,  depend 

on  the  actual  values  of  df  which  are  observed.  Hence,  an  adaptive 

3 

model  is  obtained  by  the  above  procedure. 
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